Convolutional Neural Networks

Project: Write an Algorithm for a Dog Identification App


In this notebook, some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this project. You will not need to modify the included code beyond what is requested. Sections that begin with '(IMPLEMENTATION)' in the header indicate that the following block of code will require additional functionality which you must provide. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!

Note: Once you have completed all of the code implementations, you need to finalize your work by exporting the Jupyter Notebook as an HTML document. Before exporting the notebook to html, all of the code cells need to have been run so that reviewers can see the final implementation and output. You can then export the notebook by using the menu above and navigating to File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question X' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.

The rubric contains optional "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. If you decide to pursue the "Stand Out Suggestions", you should include the code in this Jupyter notebook.


Why We're Here

In this notebook, you will make the first steps towards developing an algorithm that could be used as part of a mobile or web app. At the end of this project, your code will accept any user-supplied image as input. If a dog is detected in the image, it will provide an estimate of the dog's breed. If a human is detected, it will provide an estimate of the dog breed that is most resembling. The image below displays potential sample output of your finished project (... but we expect that each student's algorithm will behave differently!).

Sample Dog Output

In this real-world setting, you will need to piece together a series of models to perform different tasks; for instance, the algorithm that detects humans in an image will be different from the CNN that infers dog breed. There are many points of possible failure, and no perfect algorithm exists. Your imperfect solution will nonetheless create a fun user experience!

The Road Ahead

We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.

  • Step 0: Import Datasets
  • Step 1: Detect Humans
  • Step 2: Detect Dogs
  • Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
  • Step 4: Create a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 5: Write your Algorithm
  • Step 6: Test Your Algorithm

Step 0: Import Datasets

Make sure that you've downloaded the required human and dog datasets:

  • Download the dog dataset. Unzip the folder and place it in this project's home directory, at the location /dog_images.

  • Download the human dataset. Unzip the folder and place it in the home directory, at location /lfw.

Note: If you are using a Windows machine, you are encouraged to use 7zip to extract the folder.

In the code cell below, we save the file paths for both the human (LFW) dataset and dog dataset in the numpy arrays human_files and dog_files.

In [1]:
import matplotlib.pyplot as plt                        
%matplotlib inline 

import numpy as np
from glob import glob
import os
import time
from tqdm import tqdm
#from time import time
import copy
import cv2                

# load filenames for human and dog images
# human_files = np.array(glob("/data/lfw/*/*"))
# dog_files = np.array(glob("/data/dog_images/*/*/*"))

# have copied the images to my local workspace to allow for removing corrupted images, 
# so set up directory variables to allow for loading from either original or workspace
orig_dir = "/data/"
work_dir = "/home/workspace/data/"
project_dir = "/home/workspace/dog_project/"

human_files = np.array(glob(orig_dir + 'lfw/*/*'))
dog_files   = np.array(glob(orig_dir + 'dog_images/*/*/*'))

# print number of images in each dataset
print('There are %d total human images.' % len(human_files))
print('There are %d total dog images.' % len(dog_files))
There are 13233 total human images.
There are 8351 total dog images.
In [2]:
# global variables used throughout...
model_name = ""
use_weights = False
In [3]:
import torch
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device
Out[3]:
device(type='cpu')
In [4]:
# see https://github.com/pytorch/pytorch/issues/7068
import random
SEED = 0
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
np.random.seed(SEED)
random.seed(SEED)
torch.backends.cudnn.deterministic=True
In [5]:
def get_learning_rate(optimizer):
    lr=[]
    for param_group in optimizer.param_groups:
       lr +=[ param_group['lr'] ]
    return lr
In [6]:
def unfreeze(model):
    for name, child in model.named_children():
        for param in child.parameters():
            param.requires_grad = True
        unfreeze(child)
In [7]:
def freeze(model):
    for name, child in model.named_children():
        for param in child.parameters():
            param.requires_grad = False
        freeze(child)
In [8]:
def compare_dicts(dict_1, dict_2):
    dicts_differ = 0
    for key_item_1, key_item_2 in zip(dict_1.items(), dict_2.items()):
        if torch.equal(key_item_1[1], key_item_2[1]):
            pass
        else:
            dicts_differ += 1
            if (key_item_1[0] == key_item_2[0]):
                print('Mismtach found at', key_item_1[0])
            else:
                raise Exception
                
    if dicts_differ == 0:
        print('State_Dicts match')

In [9]:
def reloadModel(model, optimizer, state_dicts_name='model_d_502.pth', 
                learning_rate=1e-05, isV1=False, resetarray=True):
    
    global train_losses, valid_losses, val_acc_history, lr_hist
    global best_acc, best_acc_epoch, best_val_epoch, epoch_loss_min

    state_dicts = torch.load(state_dicts_name, map_location=lambda storage, loc: storage)   

    model_statedict = state_dicts['model_statedict']
    optimizer_statedict = state_dicts['optimizer_statedict']
        
    if isV1:
        model_dict = model.state_dict()
        # 1. filter out unnecessary keys
        pretrained_dict = {k: v for k, v in model_statedict.items() if k in model_dict}
        # 2. overwrite entries in the existing state dict
        model_dict.update(pretrained_dict) 
        # 3. load the new state dict
        model.load_state_dict(pretrained_dict)
    else:    
        model.load_state_dict(model_statedict)
        
    parameters = filter(lambda p: p.requires_grad, model.parameters())
    optimizer  = torch.optim.Adam(parameters, lr=learning_rate)
    
    optimizer.load_state_dict(optimizer_statedict)    

    best_acc_epoch = state_dicts['best_acc_epoch']
    best_val_epoch = state_dicts['best_val_epoch']
    
    if resetarray:
        train_losses = state_dicts['train_losses'][:best_acc_epoch]
        valid_losses = state_dicts['valid_losses'][:best_acc_epoch]
        val_acc_history = state_dicts['val_acc_history'][:best_acc_epoch]
        #train_acc_history = state_dicts['train_acc_history'][:best_acc_epoch]
        
        best_acc = val_acc_history[-1]
        epoch_loss_min = valid_losses[-1]
    else:
        train_losses = state_dicts['train_losses']
        valid_losses = state_dicts['valid_losses']
        val_acc_history = state_dicts['val_acc_history']
        #train_acc_history = state_dicts['train_acc_history']
        
        best_acc = val_acc_history[best_acc_epoch]
        epoch_loss_min = valid_losses[best_val_epoch]

Preparation for predictions: Utility functions

The following image-handling utility code is adapted from open source code found on the internet or in Udacity provided notebooks...

In [10]:
def imshow(image, ax=None, title=None, color="black", filename=None, normalize=True):
    """Imshow for Tensor."""
    
    global img_means, img_std
    
    if img_means is not None:
        imgmeans = img_means
    else: 
        imgmeans = [0.485, 0.456, 0.406]

    if img_std is not None:
        imgstd = img_std
    else: 
        imgstd = [0.229, 0.224, 0.225]
        
    if isinstance(image, torch.Tensor): 
        image = image.cpu()
        image = copy.deepcopy(image.numpy())
        
    if ax is None:
        fig, ax = plt.subplots()
    
    # PyTorch tensors assume the color channel is the first dimension
    # but matplotlib assumes is the third dimension
    if isinstance(image, np.ndarray):     
        image = image.transpose((1, 2, 0))
    else:
        image = image.numpy.transpose((1, 2, 0))
    
    
    if normalize:
        # Undo preprocessing
        mean = np.array(imgmeans)
        std = np.array(imgstd)
        image = std * image + mean
        # Image needs to be clipped between 0 and 1 or it looks like noise when displayed
        image = np.clip(image, 0, 1)

    if title is not None:
        ax.set_title(title, color=color)
    if filename is not None: 
        ax.set_xlabel(filename)
        
    ax.imshow(image)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    ax.tick_params(axis='both', length=0)
    ax.set_xticklabels('')
    ax.set_yticklabels('')
    
    return ax
In [11]:
def img_resize(img, sz):
    ''' Resize image to passed size based on the shorter of the image sides, using image aspect ratio
    '''
    w, h = img.size
    aspect_ratio = w / h if h < w else h / w
    width  = int(sz) if w < h else int(round(sz * aspect_ratio, 0))
    height = int(sz) if w > h else int(round(sz * aspect_ratio, 0))

    return img.resize((width, height))

def img_crop(img, sz):
    ''' Return a cropped square region from the centre of an image
    '''
    w, h = img.size
    x = (w - sz) / 2
    y = (h - sz) / 2
    x1 = x + sz
    y1 = y + sz

    return img.crop((x, y, x1, y1))

def img_process(img_path, img_sz=224, max_sz=None):
    ''' 
    Scale, crop and normalize an image, returning it as a numpy array
    '''
    global img_means, img_std
    
    if img_means is not None:
        imgmeans = img_means
    else: 
        imgmeans = [0.485, 0.456, 0.406]

    if img_std is not None:
        imgstd = img_std
    else: 
        imgstd = [0.229, 0.224, 0.225]
        
    if max_sz is None:
        max_sz = img_sz + (img_sz // 7)
        
    # Open the image
    from PIL import Image
    img = Image.open(img_path)
    
    # Resize image so shortest side is 256 pixels (for 224 image, 341 is standard for 229)
    img = img_resize(img, max_sz)
    
    # Crop image
    img = img_crop(img, img_sz)

    # Normalize
    img = np.array(img) / 255
    
    mean = np.array(imgmeans) # provided mean or ImageNet mean
    std = np.array(imgstd)    # provided std or ImageNet std
    
    img = (img - mean) / std
    
    # Pytorch requires color channels in the first dimension (opposite of PIL)
    img = img.transpose((2, 0, 1))
    
    return img

Have downloaded a dictionary of the entire ImageNet library into the workspace so we can match names to the class values...

In [12]:
os.path.isfile(project_dir + "imagenet1000_clsidx_to_labels.txt")
Out[12]:
True
In [13]:
ImageNetDict = eval(open("imagenet1000_clsidx_to_labels.txt").read())

Set image mean and std to values (see calculations later in notebook) if ImageNet defaults are not wanted...

In [14]:
# Allows for specific calculation of image means and standard deviation, 
# defaults are the imagenet values...
img_means = None
img_std = None
In [15]:
img_means = [0.4868, 0.4666, 0.3972]
img_std = [0.2605, 0.2551, 0.2609]

To allow for long-running processes (i.e. network training) import workspace_utils. Using magic command %load to import into next cell.

In [ ]:
os.path.isfile(project_dir + "workspace_utils.py")
In [16]:
# %load workspace_utils.py
import signal
 
from contextlib import contextmanager
 
import requests
 
 
DELAY = INTERVAL = 4 * 60  # interval time in seconds
MIN_DELAY = MIN_INTERVAL = 2 * 60
KEEPALIVE_URL = "https://nebula.udacity.com/api/v1/remote/keep-alive"
TOKEN_URL = "http://metadata.google.internal/computeMetadata/v1/instance/attributes/keep_alive_token"
TOKEN_HEADERS = {"Metadata-Flavor":"Google"}
 
 
def _request_handler(headers):
    def _handler(signum, frame):
        requests.request("POST", KEEPALIVE_URL, headers=headers)
    return _handler
 
 
@contextmanager
def active_session(delay=DELAY, interval=INTERVAL):
    """
    Example:
 
    from workspace_utils import active_session
 
    with active_session():
        # do long-running work here
    """
    token = requests.request("GET", TOKEN_URL, headers=TOKEN_HEADERS).text
    headers = {'Authorization': "STAR " + token}
    delay = max(delay, MIN_DELAY)
    interval = max(interval, MIN_INTERVAL)
    original_handler = signal.getsignal(signal.SIGALRM)
    try:
        signal.signal(signal.SIGALRM, _request_handler(headers))
        signal.setitimer(signal.ITIMER_REAL, delay, interval)
        yield
    finally:
        signal.signal(signal.SIGALRM, original_handler)
        signal.setitimer(signal.ITIMER_REAL, 0)
 
 
def keep_awake(iterable, delay=DELAY, interval=INTERVAL):
    """
    Example:
 
    from workspace_utils import keep_awake
 
    for i in keep_awake(range(5)):
        # do iteration with lots of work here
    """
    with active_session(delay, interval): yield from iterable

Step 1: Detect Humans

In this section, we use OpenCV's implementation of Haar feature-based cascade classifiers to detect human faces in images.

OpenCV provides many pre-trained face detectors, stored as XML files on github. We have downloaded one of these detectors and stored it in the haarcascades directory. In the next code cell, we demonstrate how to use this detector to find human faces in a sample image.

In [17]:
import cv2                
import matplotlib.pyplot as plt                        
%matplotlib inline                               
In [18]:
# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')

# load color (BGR) image
img = cv2.imread(human_files[0])
# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# find faces in image
faces = face_cascade.detectMultiScale(gray)

# print number of faces detected in the image
print('Number of faces detected:', len(faces))

# get bounding box for each detected face
for (x,y,w,h) in faces:
    # add bounding box to color image
    cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
    
# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# display the image, along with bounding box
plt.imshow(cv_rgb)
plt.show()
Number of faces detected: 1

Before using any of the face detectors, it is standard procedure to convert the images to grayscale. The detectMultiScale function executes the classifier stored in face_cascade and takes the grayscale image as a parameter.

In the above code, faces is a numpy array of detected faces, where each row corresponds to a detected face. Each detected face is a 1D array with four entries that specifies the bounding box of the detected face. The first two entries in the array (extracted in the above code as x and y) specify the horizontal and vertical positions of the top left corner of the bounding box. The last two entries in the array (extracted here as w and h) specify the width and height of the box.

Write a Human Face Detector

We can use this procedure to write a function that returns True if a human face is detected in an image and False otherwise. This function, aptly named face_detector, takes a string-valued file path to an image as input and appears in the code block below.

In [19]:
# returns "True" if face is detected in image stored at img_path
def face_detector(img_path):
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    return len(faces) > 0

(IMPLEMENTATION) Assess the Human Face Detector

Question 1: Use the code cell below to test the performance of the face_detector function.

  • What percentage of the first 100 images in human_files have a detected human face?
  • What percentage of the first 100 images in dog_files have a detected human face?

Ideally, we would like 100% of human images with a detected face and 0% of dog images with a detected face. You will see that our algorithm falls short of this goal, but still gives acceptable performance. We extract the file paths for the first 100 images from each of the datasets and store them in the numpy arrays human_files_short and dog_files_short.

Answer: (You can print out your results and/or write your percentages in this cell)

In [20]:
from tqdm import tqdm

human_files_short = human_files[:100]
dog_files_short = dog_files[:100]

#-#-# Do NOT modify the code above this line. #-#-#

## TODO: Test the performance of the face_detector algorithm 
## on the images in human_files_short and dog_files_short.
def countFaces(faceList):
    return [face_detector(img) for img in faceList]

human_faces = countFaces(human_files_short)
dog_faces = countFaces(dog_files_short)

print("{}% of human faces detected".format(sum(human_faces)))
print("{}% of dogs detected as having human faces!".format(sum(dog_faces)))
98% of human faces detected
17% of dogs detected as having human faces!
CPU times: user 1min 30s, sys: 708 ms, total: 1min 30s
Wall time: 1min 31s
In [ ]:
human_files_short = human_files[:100]
dog_files_short = dog_files[:100]

In the first 1,000 images from each set, 98% of the human faces were detected, conversely 17% of dogs were detected as having human faces

In [21]:
bad_human_faces = [a for a, b in zip(human_files_short, human_faces) if b == False]
bad_dog_faces = [a for a, b in zip(dog_files_short, dog_faces) if b == True]
In [22]:
len(bad_human_faces), len(bad_dog_faces)
Out[22]:
(2, 17)
In [23]:
for ii in range(len(bad_human_faces)):
    print(bad_human_faces[ii])  
/data/lfw/Julianne_Moore/Julianne_Moore_0002.jpg
/data/lfw/Clive_Lloyd/Clive_Lloyd_0001.jpg
In [24]:
_, axes = plt.subplots(figsize=(20,6), ncols=2)

for ii in range(2):
    ax = axes[ii]
    img = cv2.imread(bad_human_faces[ii])
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    ax.imshow(cv_rgb)
In [25]:
for ii in range(len(bad_dog_faces)):
    print(bad_dog_faces[ii]) 
/data/dog_images/train/103.Mastiff/Mastiff_06844.jpg
/data/dog_images/train/103.Mastiff/Mastiff_06841.jpg
/data/dog_images/train/103.Mastiff/Mastiff_06860.jpg
/data/dog_images/train/103.Mastiff/Mastiff_06834.jpg
/data/dog_images/train/103.Mastiff/Mastiff_06829.jpg
/data/dog_images/train/103.Mastiff/Mastiff_06856.jpg
/data/dog_images/train/103.Mastiff/Mastiff_06872.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04181.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04209.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04207.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04180.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04186.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04191.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04189.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04214.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04157.jpg
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04162.jpg
In [26]:
_, axes = plt.subplots(figsize=(20,6), ncols=5)

for ii in range(5):
    ax = axes[ii]
    img = cv2.imread(bad_dog_faces[ii])
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    for (x,y,w,h) in faces:
        cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)    
    ax.imshow(cv_rgb)
In [27]:
testdogs = [orig_dir + 'dog_images/test/004.Akita/Akita_00258.jpg', 
            orig_dir + 'dog_images/test/005.Alaskan_malamute/Alaskan_malamute_00330.jpg', 
            orig_dir + 'dog_images/train/103.Mastiff/Mastiff_06834.jpg', 
            orig_dir + 'dog_images/test/004.Akita/Akita_00282.jpg']
_, axes = plt.subplots(figsize=(20,6), ncols=4)
for ii in range(4):
    ax = axes[ii]
    img = cv2.imread(testdogs[ii])
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    for (x,y,w,h) in faces:
        cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)    
    ax.imshow(cv_rgb)
In [28]:
figsize = (20, 10)

fig = plt.figure(figsize=figsize)
    
img = cv2.imread(testdogs[2])
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
for (x,y,w,h) in faces:
    cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

plt.imshow(cv_rgb)
Out[28]:
<matplotlib.image.AxesImage at 0x7fbd30baa748>
In [29]:
figsize = (20, 10)

fig = plt.figure(figsize=figsize)
    
img = cv2.imread(testdogs[3])
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
for (x,y,w,h) in faces:
    cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

plt.imshow(cv_rgb)
Out[29]:
<matplotlib.image.AxesImage at 0x7fbd30b0df28>

Examination of a few images shows the failure of human face detection may be due to poor resolution or focus, contrast and color in the human photos, or if the photo is a profile or otherwise obscured. There may be genuine human faces contained in some of the dog photos, or dogs facing front-on to the camera with clearly discernable features can be interpreted as having a human face. Additionally, geometric shapes that includes something like two eyes and a mouth can be interpreted as a face, and even worse - there can be no understandable reason for face detection!


We suggest the face detector from OpenCV as a potential way to detect human images in your algorithm, but you are free to explore other approaches, especially approaches that make use of deep learning :). Please use the code cell below to design and test your own face detection algorithm. If you decide to pursue this optional task, report performance on human_files_short and dog_files_short.

In [30]:
### (Optional) 
### TODO: Test performance of anotherface detection algorithm.
### Feel free to use as many code cells as needed.

Further explore the data

In [31]:
data_dir  = orig_dir + 'dog_images/'
train_dir = data_dir + 'train/'
valid_dir = data_dir + 'valid/'
test_dir  = data_dir + 'test/'
In [32]:
from pathlib import Path
import os

def folders_in_path(path):
    if not Path.is_dir(path):
        raise ValueError("argument is not a directory")
    yield from filter(Path.is_dir, path.iterdir())

def folders_in_depth(path, depth):
    if 0 > depth:
        raise ValueError("depth smaller 0")
    if 0 == depth:
        yield from folders_in_path(path)
    else:
        for folder in folders_in_path(path):
            yield from folders_in_depth(folder, depth-1)

def files_in_path(path):
    if not Path.is_dir(path):
        raise ValueError("argument is not a directory")
    yield from filter(Path.is_file, path.iterdir())
In [33]:
def files_per_folder(dir_path, dir_desc):
    files_per_folder = []
    for folder in folders_in_depth(Path.cwd()/dir_path,0):
        files = list(files_in_path(folder))
        foldername = os.path.basename(os.path.normpath(folder))
        files_per_folder.append((foldername, len(files)))

    print(dir_desc)
    print("-" * 48)
    for image in files_per_folder:
        img_name = image[0]+' '*50
        img_name = img_name[:45]
        img_count = image[1]
        print('{} {}'.format(img_name, img_count))
In [ ]:
files_per_folder(train_dir, "Training data files per class:")
In [ ]:
files_per_folder(valid_dir, "Validation data files per class:")
In [ ]:
files_per_folder(test_dir, "Testing data files per class:")
In [ ]:
files_per_folder(orig_dir + "lfw/", "Human image counts:")

Step 2: Detect Dogs

In this section, we use a pre-trained model to detect dogs in images.

Obtain Pre-trained VGG-16 Model

The code cell below downloads the VGG-16 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories.

In [38]:
import torch
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Out[38]:
device(type='cuda')
In [39]:
import torch
import torchvision.models as models

# define VGG16 model
VGG16 = models.vgg16(pretrained=True)

# check if CUDA is available
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# move model to GPU if CUDA is available
if use_cuda:
    VGG16 = VGG16.cuda()
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.torch/models/vgg16-397923af.pth
100%|██████████| 553433881/553433881 [00:05<00:00, 103063893.55it/s]
In [40]:
device

Ensure model is in evaluate mode...

In [41]:
VGG16.eval();
In [42]:
print(VGG16.classifier)
Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace)
  (2): Dropout(p=0.5)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace)
  (5): Dropout(p=0.5)
  (6): Linear(in_features=4096, out_features=1000, bias=True)
)

Given an image, this pre-trained VGG-16 model returns a prediction (derived from the 1000 possible categories in ImageNet) for the object that is contained in the image.

Add class to name mapper to VGG16 classifier from previously created ImageNetDict

In [ ]:
ImageNetDict
In [44]:
VGG16.class_to_name = ImageNetDict

Preparation for prediction: Calculate image means and standard deviation

The ImageNet defaults are probably quite sufficient for normalisation, since these are ImageNet images. But as they are a subset its perhaps worth experimenting with calculating means and std from just these images, and in any case its a useful tool to have.

The first approach I have developed (informed by Prof Google), the second approach comes from the Udacity student hub. I am using the results from my own approach as the standard deviation figures differ and seem more realistic to me. However, I will alter this if I find a relaible definition of calculating std.

In [ ]:
# Allows for specific calculation of image means and standard deviation, 
# defaults are the imagenet values...
img_means = None
img_std = None

Set manually to results obtained below in order to skip the processing step...

After running next cell go to "Skip to here to avoid normalization calculations" if wanting to avoid the next few cells...

In [45]:
img_means = [0.4868, 0.4666, 0.3972]
img_std = [0.2605, 0.2551, 0.2609]

Using ImageFile.LOAD_TRUNCATED_IMAGES = True and num_workers = 0 to prevent error when loading corrupted image train/098.Leonberger/Leonberger_06571.jpg. However this was very slow (6 minutes instead of 1 minute), so ideally the image should be deleted...

In [46]:
if os.path.isfile(train_dir + '098.Leonberger/Leonberger_06571.jpg'):
    print('Removing',train_dir + '098.Leonberger/Leonberger_06571.jpg')
    os.remove(train_dir + '098.Leonberger/Leonberger_06571.jpg')
else:
    print(train_dir + '098.Leonberger/Leonberger_06571.jpg was not found')  
Removing /data/dog_images/train/098.Leonberger/Leonberger_06571.jpg
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-15-b2f56f0101d0> in <module>()
      1 if os.path.isfile(train_dir + '098.Leonberger/Leonberger_06571.jpg'):
      2     print('Removing',train_dir + '098.Leonberger/Leonberger_06571.jpg')
----> 3     os.remove(train_dir + '098.Leonberger/Leonberger_06571.jpg')
      4 else:
      5     print(train_dir + '098.Leonberger/Leonberger_06571.jpg was not found')

OSError: [Errno 30] Read-only file system: '/data/dog_images/train/098.Leonberger/Leonberger_06571.jpg'

In the Dog Project Workspace the bad image cannot be deleted. For now I am processing with LOAD_TRUNCATED_IMAGES = True...

In [47]:
from PIL import Image
from PIL import ImageFile 
# Try using with num_workers = 0 to handle loading of corrupted files...
ImageFile.LOAD_TRUNCATED_IMAGES = True
import torch
import torchvision.transforms as transforms
In [48]:
from torchvision import datasets, transforms
import os

image_size = 224
# Setting num_workers to zero in the project workspace
# Anything higher gets RuntimeError: DataLoader worker (pid 83) is killed by signal: Bus error.
num_workers = 0

tfms_basic = transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.ToTensor()
])

dataset1 = datasets.ImageFolder(train_dir, transform=tfms_basic)
dataloader1 = torch.utils.data.DataLoader(dataset1, num_workers=num_workers, batch_size=128)
dataset2 = datasets.ImageFolder(valid_dir, transform=tfms_basic)
dataloader2 = torch.utils.data.DataLoader(dataset2, num_workers=num_workers, batch_size=128)

Method 1 to calculate actual image means and standard deviation of the dogs images

In [49]:
%%time

# Get the mean and standard deviation of all images in train and valid datasets
red_chan = []
gre_chan = []
blu_chan = []

for images, _ in dataloader1:
    for image in images:
        red_chan.append(image[0])
        gre_chan.append(image[1])
        blu_chan.append(image[2])
        
for images, _ in dataloader2:
    for image in images:
        red_chan.append(image[0])
        gre_chan.append(image[1])
        blu_chan.append(image[2])    
        
red_channels = torch.cat(red_chan, dim=0)
green_channels = torch.cat(gre_chan, dim=0)
blue_channels = torch.cat(blu_chan, dim=0) 
CPU times: user 1min 56s, sys: 17 s, total: 2min 13s
Wall time: 6min 13s
In [50]:
img_means = round(red_channels.mean().item(), 4), round(green_channels.mean().item(), 4), round(blue_channels.mean().item(), 4)
img_means = list(img_means)
img_means
Out[50]:
[0.4868, 0.4666, 0.3972]
In [51]:
img_std = round(red_channels.std().item(), 4), round(green_channels.std().item(), 4), round(blue_channels.std().item(), 4)
img_std = list(img_std)
img_std
Out[51]:
[0.2605, 0.2551, 0.2609]

Results of calculating means and std:

img_means = [0.4868, 0.4666, 0.3972]
img_std = [0.2605, 0.2551, 0.2609]

Compare with ImageNet defaults:

imgmeans = [0.485, 0.456, 0.406]
imgstd = [0.229, 0.224, 0.225]

Method 2 from the Udacity Student Hub gives a different result for the std...

In [52]:
accumulated = torch.from_numpy(np.zeros((3, image_size * image_size))).float()
In [53]:
%%time

for data, *_ in dataset1:
    modified = data.view(3, -1)
    accumulated.add_(modified)
    
for data, *_ in dataset2:
    modified = data.view(3, -1)
    accumulated.add_(modified)  

means = accumulated.mean(dim=1) / (len(dataset1) + len(dataset2))
stds  = accumulated.std(dim=1)  / (len(dataset1) + len(dataset2))

Method 2 results:

means = [0.4861, 0.4560, 0.3918]
stds = [0.0070, 0.0189, 0.0104]

Skip to here to avoid normalization calculations


(IMPLEMENTATION) Making Predictions with a Pre-trained Model

In the next code cell, you will write a function that accepts a path to an image (such as 'dogImages/train/001.Affenpinscher/Affenpinscher_00001.jpg') as input and returns the index corresponding to the ImageNet class that is predicted by the pre-trained VGG-16 model. The output should always be an integer between 0 and 999, inclusive.

Before writing the function, make sure that you take the time to learn how to appropriately pre-process tensors for pre-trained models in the PyTorch documentation.

In [54]:
from PIL import Image
from PIL import ImageFile 
ImageFile.LOAD_TRUNCATED_IMAGES = True

import torchvision.transforms as transforms
import torch.nn.functional as F

def VGG16_predict(img_path, topk=1):
    '''
    Use pre-trained VGG-16 model to obtain index corresponding to 
    predicted ImageNet class for image at specified path
    
    Args:
        img_path: path to an image
        topk: number of predictions to return (allow for top 5 for instance)
        
    Returns:
        Index(s) corresponding to VGG-16 model's prediction
        Probability of the prediction(s)
    '''
    
    ## TODO: Complete the function.
    ## Load and pre-process an image from the given img_path
    ## Return the *index* of the predicted class for that image
    img = img_process(img_path)
    img_tensor = torch.from_numpy(img).type(torch.FloatTensor)
    img_tensor.unsqueeze_(0)

    VGG16.cpu()
    VGG16.eval()
    
    with torch.no_grad():
        log_ps = F.softmax(VGG16.forward(img_tensor), dim=1)
        probs, classes = torch.topk(log_ps, k=topk)

        probs = probs.view(topk).detach().numpy().tolist()
        classes = classes.view(topk).detach().numpy().tolist()
        classnames = [VGG16.class_to_name[cls] for cls in classes]
        
    return classes, classnames, probs 
In [55]:
data_dir  = orig_dir + 'dog_images/'
train_dir = data_dir + 'train/'
valid_dir = data_dir + 'valid/'
test_dir  = data_dir + 'test/'
In [56]:
imshow(img_process(train_dir+'001.Affenpinscher/Affenpinscher_00001.jpg'), title='Affenpinscher')
Out[56]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8875d7b1d0>
In [57]:
VGG16_predict(train_dir+'001.Affenpinscher/Affenpinscher_00001.jpg')
Out[57]:
([252], ['affenpinscher, monkey pinscher, monkey dog'], [0.9544302225112915])

From the dictionary: 252: 'affenpinscher, monkey pinscher, monkey dog',

(IMPLEMENTATION) Write a Dog Detector

While looking at the dictionary, you will notice that the categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151-268, inclusive, to include all categories from 'Chihuahua' to 'Mexican hairless'. Thus, in order to check to see if an image is predicted to contain a dog by the pre-trained VGG-16 model, we need only check if the pre-trained model predicts an index between 151 and 268 (inclusive).

Use these ideas to complete the dog_detector function below, which returns True if a dog is detected in an image (and False if not).

In [58]:
### returns "True" if a dog is detected in the image stored at img_path
def dog_detector(img_path):
    ## TODO: Complete the function.
    prediction = VGG16_predict(img_path)
    
    dog_detected = (prediction[0][0] in range(151, 269))
    
    return dog_detected
In [59]:
dog_detector(train_dir+'001.Affenpinscher/Affenpinscher_00001.jpg')
Out[59]:
True

(IMPLEMENTATION) Assess the Dog Detector

Question 2: Use the code cell below to test the performance of your dog_detector function.

  • What percentage of the images in human_files_short have a detected dog?
  • What percentage of the images in dog_files_short have a detected dog?

Answer:

In [60]:
%%time

### TODO: Test the performance of the dog_detector function
### on the images in human_files_short and dog_files_short.
def countDogs(ImageList):
    return [dog_detector(img) for img in ImageList]

human_dogs = countDogs(human_files_short)
dog_dogs = countDogs(dog_files_short)

print("{}% of dogs detected in human data".format(sum(human_dogs)))
print("{}% of dogs detected in dogs data".format(sum(dog_dogs)))
1% of dogs detected in human data
99% of dogs detected in dogs data
CPU times: user 2min 29s, sys: 19.7 s, total: 2min 49s
Wall time: 2min 50s
In [61]:
bad_human_dogs = [a for a, b in zip(human_files_short, human_dogs) if b == True]
bad_dog_dogs = [a for a, b in zip(dog_files_short, dog_dogs) if b == False]
In [62]:
for ii in range(len(bad_human_dogs)):
    print(bad_human_dogs[ii])  
/data/lfw/Perri_Shaw/Perri_Shaw_0001.jpg
In [63]:
VGG16_predict(bad_human_dogs[0])
Out[63]:
([233], ['Bouvier des Flandres, Bouviers des Flandres'], [0.12169972062110901])
In [64]:
_, axes = plt.subplots(figsize=(20,6), ncols=2)

for ii in range(2):
    ax = axes[ii]
    if ii == 0:
        img = cv2.imread(bad_human_dogs[0])
    else:
        img = cv2.imread(test_dir+'033.Bouvier_des_flandres/Bouvier_des_flandres_02305.jpg') 
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    ax.imshow(cv_rgb)
In [65]:
for ii in range(len(bad_dog_dogs)):
    print(bad_dog_dogs[ii])  
/data/dog_images/train/059.Doberman_pinscher/Doberman_pinscher_04191.jpg
In [66]:
plt.subplots(figsize=(20,6), ncols=1)

img = cv2.imread(bad_dog_dogs[0])
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(cv_rgb)
Out[66]:
<matplotlib.image.AxesImage at 0x7fbcbe0a3b38>

We suggest VGG-16 as a potential network to detect dog images in your algorithm, but you are free to explore other pre-trained networks (such as Inception-v3, ResNet-50, etc). Please use the code cell below to test other pre-trained PyTorch models. If you decide to pursue this optional task, report performance on human_files_short and dog_files_short.

In [67]:
### (Optional) 
### TODO: Report the performance of another pre-trained network.
### Feel free to use as many code cells as needed.

Fix bad image - approach 1:

Set data to use the workspace versions of the files, then the bad image can be deleted...

  • Files must first be copied into appropriate sub-directories...
In [68]:
data_dir  = work_dir + 'dog_images/'
train_dir = data_dir + 'train/'
valid_dir = data_dir + 'valid/'
test_dir  = data_dir + 'test/'
In [69]:
if os.path.isfile(train_dir + '098.Leonberger/Leonberger_06571.jpg'):
    print('Removing',train_dir + '098.Leonberger/Leonberger_06571.jpg')
    os.remove(train_dir + '098.Leonberger/Leonberger_06571.jpg')
else:
    print(train_dir + '098.Leonberger/Leonberger_06571.jpg was not found')  
Removing /home/workspace/data//dog_images/train/098.Leonberger/Leonberger_06571.jpg

Fix bad image - approach 2:

Set data to use the original versions of the files, retain bad image by ImageFile.LOAD_TRUNCATED_IMAGES

In [70]:
from PIL import Image
from PIL import ImageFile 
ImageFile.LOAD_TRUNCATED_IMAGES = True

data_dir  = orig_dir + 'dog_images/'
train_dir = data_dir + 'train/'
valid_dir = data_dir + 'valid/'
test_dir  = data_dir + 'test/'

Step 3: Create a CNN to Classify Dog Breeds (from Scratch)

Now that we have functions for detecting humans and dogs in images, we need a way to predict breed from images. In this step, you will create a CNN that classifies dog breeds. You must create your CNN from scratch (so, you can't use transfer learning yet!), and you must attain a test accuracy of at least 10%. In Step 4 of this notebook, you will have the opportunity to use transfer learning to create a CNN that attains greatly improved accuracy.

We mention that the task of assigning breed to dogs from images is considered exceptionally challenging. To see why, consider that even a human would have trouble distinguishing between a Brittany and a Welsh Springer Spaniel.

Brittany Welsh Springer Spaniel

It is not difficult to find other dog breed pairs with minimal inter-class variation (for instance, Curly-Coated Retrievers and American Water Spaniels).

Curly-Coated Retriever American Water Spaniel

Likewise, recall that labradors come in yellow, chocolate, and black. Your vision-based algorithm will have to conquer this high intra-class variation to determine how to classify all of these different shades as the same breed.

Yellow Labrador Chocolate Labrador Black Labrador

We also mention that random chance presents an exceptionally low bar: setting aside the fact that the classes are slightly imabalanced, a random guess will provide a correct answer roughly 1 in 133 times, which corresponds to an accuracy of less than 1%.

Remember that the practice is far ahead of the theory in deep learning. Experiment with many different architectures, and trust your intuition. And, of course, have fun!

Use either work directory, or original directory with load truncated images

In [71]:
data_dir  = work_dir + 'dog_images/'
train_dir = data_dir + 'train/'
valid_dir = data_dir + 'valid/'
test_dir  = data_dir + 'test/'
In [72]:
data_dir  = orig_dir + 'dog_images/'
train_dir = data_dir + 'train/'
valid_dir = data_dir + 'valid/'
test_dir  = data_dir + 'test/'

(IMPLEMENTATION) Specify Data Loaders for the Dog Dataset

Use the code cell below to write three separate data loaders for the training, validation, and test datasets of dog images (located at dogImages/train, dogImages/valid, and dogImages/test, respectively). You may find this documentation on custom datasets to be a useful resource. If you are interested in augmenting your training and/or validation data, check out the wide variety of transforms!

In [73]:
import os
import torch
from torchvision import datasets, transforms
from PIL import Image
from PIL import ImageFile 
ImageFile.LOAD_TRUNCATED_IMAGES = True

### TODO: Write data loaders for training, validation, and test sets
## Specify appropriate transforms, and batch_sizes

batch_size = 32 
num_workers = 0   # Using 0 in Dog Project Workspace, otherwise 4
image_size = 224  # 180  

global img_means, img_std

if img_means is not None:
    imgmeans = img_means
else: 
    imgmeans = [0.485, 0.456, 0.406]

if img_std is not None:
    imgstd = img_std
else: 
    imgstd = [0.229, 0.224, 0.225]
        
data_transforms = {
    'train': transforms.Compose([transforms.RandomAffine(15, translate=(0.1, 0.1), scale=(1.0, 1.5), 
                                                         shear=None, resample=Image.BILINEAR, 
                                                         fillcolor=0),
                                 transforms.Resize(image_size + (image_size//7), 
                                                   interpolation=Image.BILINEAR),
                                 transforms.CenterCrop(image_size),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ColorJitter(brightness=0.3, contrast=0.3, 
                                                        saturation=0.2, hue=0.05), 
                                 transforms.ToTensor(),
                                 transforms.Normalize(imgmeans, imgstd)
                                ]), 
    'valid': transforms.Compose([transforms.Resize(image_size + (image_size//7)),
                                 transforms.CenterCrop(image_size),
                                 transforms.ToTensor(),
                                transforms.Normalize(imgmeans, imgstd)
                                ]),
    'test' : transforms.Compose([transforms.Resize(image_size + (image_size//7)),
                                 transforms.CenterCrop(image_size),
                                 transforms.ToTensor(),
                                 transforms.Normalize(imgmeans, imgstd)
                                ])
}    

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'valid', 'test']}

class_names = image_datasets['train'].classes
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'valid', 'test']}

loaders_scratch = {x: torch.utils.data.DataLoader(image_datasets[x], 
                                              batch_size=batch_size,
                                              shuffle=False if x[:4]=='test' else True, 
                                              num_workers=num_workers)
              for x in ['train', 'valid', 'test']}
In [ ]:
class_names
In [75]:
class_len = len(class_names)
class_len
Out[75]:
133
In [76]:
dataset_sizes
Out[76]:
{'train': 6679, 'valid': 835, 'test': 836}

Create a tensor of class weights in case I want to cope with class imbalances...

In [77]:
from collections import defaultdict
class_counts = defaultdict(int)
for _, c in image_datasets["train"].imgs:
    class_counts[c] += 1

class_weights = [1-(float(class_counts[class_id])/len(image_datasets["train"].imgs))
                 for class_id in range(len(image_datasets["train"].classes))]
class_weights = torch.FloatTensor(class_weights)
class_weights.to(device)
Out[77]:
tensor([ 0.9904,  0.9913,  0.9922,  0.9906,  0.9885,  0.9904,  0.9925,
         0.9901,  0.9949,  0.9925,  0.9901,  0.9901,  0.9931,  0.9897,
         0.9891,  0.9912,  0.9907,  0.9925,  0.9928,  0.9907,  0.9904,
         0.9930,  0.9903,  0.9907,  0.9945,  0.9939,  0.9904,  0.9948,
         0.9889,  0.9922,  0.9916,  0.9903,  0.9933,  0.9904,  0.9921,
         0.9903,  0.9925,  0.9915,  0.9897,  0.9921,  0.9897,  0.9906,
         0.9925,  0.9904,  0.9921,  0.9900,  0.9919,  0.9919,  0.9925,
         0.9925,  0.9907,  0.9927,  0.9930,  0.9915,  0.9925,  0.9903,
         0.9894,  0.9925,  0.9930,  0.9910,  0.9909,  0.9921,  0.9921,
         0.9942,  0.9937,  0.9951,  0.9949,  0.9906,  0.9924,  0.9930,
         0.9907,  0.9928,  0.9937,  0.9939,  0.9934,  0.9904,  0.9936,
         0.9940,  0.9912,  0.9931,  0.9916,  0.9909,  0.9931,  0.9925,
         0.9945,  0.9921,  0.9901,  0.9924,  0.9921,  0.9913,  0.9915,
         0.9934,  0.9948,  0.9934,  0.9927,  0.9936,  0.9925,  0.9933,
         0.9937,  0.9949,  0.9928,  0.9957,  0.9913,  0.9937,  0.9954,
         0.9925,  0.9931,  0.9961,  0.9933,  0.9951,  0.9934,  0.9919,
         0.9942,  0.9948,  0.9906,  0.9955,  0.9928,  0.9921,  0.9954,
         0.9942,  0.9958,  0.9952,  0.9934,  0.9925,  0.9949,  0.9955,
         0.9939,  0.9955,  0.9928,  0.9934,  0.9955,  0.9961,  0.9955], device='cuda:0')

Question 3: Describe your chosen procedure for preprocessing the data.

  • How does your code resize the images (by cropping, stretching, etc)? What size did you pick for the input tensor, and why?
  • Did you decide to augment the dataset? If so, how (through translations, flips, rotations, etc)? If not, why not?

Answer:

The dataloaders process images via Pytorch datasets, which incorporate Pytorch transforms on the images.

Each transforms resizes images to a maximum that allows for a subsequent center crop to reduce the image size to the desired value.

A custom normalisation can be used via the variables img_means and img_std, but ImageNet normalization is used by default if these variables are None.

Additionally, the train transforms employs image augmentation which incorporates RandomAffine for rotation and stretching the image, and uses color jitter to introduce color and brightness variations. This is important to increase the variation of training images which will help prevent overfitting.

An initial image size of 180x180 was chosen as a smaller size allows for more reasonable batch sizes and faster processing, important when testing a network being trained from scratch. This was subsequently adjusted up to 224x224 once a useful architecture was deduced, to conform with an expected ImageNet size.

A tensor of class weights was created in case it's required to offset class imbalances.

I am using num_workers at 0 in the Dog Project workspace as I get an error allocating memory with a higher setting. On my personal system I use 4, which loads the data three to four times faster (GPU is also faster on the PC)...


Preparation and notes for the CNN construction...

I have implmented an apdaptive pooling approach for the classifier, which allows me to simply specify the desired output to the classifier without having to do a calculation and hard-wiring the incoming tensor size. See Jeremy Howard's nn tutorial for a mention of this, where he says "replace nn.AvgPool2d with nn.AdaptiveAvgPool2d, which allows us to define the size of the output tensor we want, rather than the input tensor we have. As a result, our model will work with any size input".

The advantage of this approach is not only that any image size can be freely used, but that in allowing that ability image sizes can be altered during learning, which has been noted to wake up the process and allow another step of model improvement, reducing overfitting. I used this to good effect in my model used for the Pytorch Challenge.

I have created a version of the approach taken by fast.ai, where a class AdaptiveConcatPool2d is defined that combines an AdaptiveAvgPool2d with a AdaptiveMaxPool2d, this has the advantage of allowing the model to learn from maximum values as well as averge ones. See more detailed discussions in the following links.

In [78]:
# Adapted from fastai...
import torch.nn as nn

class AdaptiveConcatPool2d(nn.Module):
    def __init__(self, sz=None):
        super().__init__()
        sz = sz or (1,1)
        self.ap = nn.AdaptiveAvgPool2d(sz)
        self.mp = nn.AdaptiveMaxPool2d(sz)
    def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)
In [79]:
class Flatten(nn.Module):
    def __init__(self):
        super(Flatten, self).__init__()

    def forward(self, x):
        x = x.view(x.size(0), -1)
        return x

(IMPLEMENTATION) Model Architecture

Create a CNN to classify dog breed. Use the template in the code cell below.

3 layer CNN with starting kernel 5 and 48-96-128 node layers

  • this was the most succesful architecture in terms of accuracy gained per epoch
  • this model's state dict is saved as model_scratch.pt
  • chart of results from this model are printed below
  • See Appendix for output of first 300 epochs
  • see Appendix for other models tried
In [80]:
import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
    ### TODO: choose an architecture, and complete the class
    def __init__(self, num_classes=133):
        super(Net, self).__init__()
        ## Define layers of a CNN
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=5, stride=1, padding=2, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(48, 96, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=1, padding=0)
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(96, 128, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=1, padding=0)
        )
        self.classifier = nn.Sequential(
            AdaptiveConcatPool2d(),
            Flatten(),
            nn.BatchNorm1d(128*2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=128*2, out_features=512, bias=True),
            nn.ReLU(),
            nn.BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=512, out_features=num_classes, bias=True)
        )    
        
        
    def forward(self, x):
        ## Define forward behavior
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.classifier(x)

        return x

#-#-# You so NOT have to modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()
In [81]:
print(model_scratch)
Net(
  (layer1): Sequential(
    (0): Conv2d(3, 48, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer2): Sequential(
    (0): Conv2d(48, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
  )
  (layer3): Sequential(
    (0): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=(1, 1))
      (mp): AdaptiveMaxPool2d(output_size=(1, 1))
    )
    (1): Flatten()
    (2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.25)
    (4): Linear(in_features=256, out_features=512, bias=True)
    (5): ReLU()
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.25)
    (8): Linear(in_features=512, out_features=133, bias=True)
  )
)

summary below requires torchsummary...

  • load torchsummary if wanting to execute the summary cell
In [82]:
!pip install torchsummary
from torchsummary import summary
Collecting torchsummary
  Downloading https://files.pythonhosted.org/packages/7d/18/1474d06f721b86e6a9b9d7392ad68bed711a02f3b61ac43f13c719db50a6/torchsummary-1.5.1-py3-none-any.whl
Installing collected packages: torchsummary
Successfully installed torchsummary-1.5.1
You are using pip version 9.0.1, however version 19.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
In [83]:
summary(model_scratch, (3, 224, 224))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 48, 224, 224]           3,600
              ReLU-2         [-1, 48, 224, 224]               0
         MaxPool2d-3         [-1, 48, 112, 112]               0
            Conv2d-4         [-1, 96, 112, 112]          41,472
              ReLU-5         [-1, 96, 112, 112]               0
         MaxPool2d-6         [-1, 96, 111, 111]               0
            Conv2d-7        [-1, 128, 111, 111]         110,592
              ReLU-8        [-1, 128, 111, 111]               0
         MaxPool2d-9        [-1, 128, 110, 110]               0
AdaptiveMaxPool2d-10            [-1, 128, 1, 1]               0
AdaptiveAvgPool2d-11            [-1, 128, 1, 1]               0
AdaptiveConcatPool2d-12            [-1, 256, 1, 1]               0
          Flatten-13                  [-1, 256]               0
      BatchNorm1d-14                  [-1, 256]             512
          Dropout-15                  [-1, 256]               0
           Linear-16                  [-1, 512]         131,584
             ReLU-17                  [-1, 512]               0
      BatchNorm1d-18                  [-1, 512]           1,024
          Dropout-19                  [-1, 512]               0
           Linear-20                  [-1, 133]          68,229
================================================================
Total params: 357,013
Trainable params: 357,013
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 104.65
Params size (MB): 1.36
Estimated Total Size (MB): 106.59
----------------------------------------------------------------

Question 4: Outline the steps you took to get to your final CNN architecture and your reasoning at each step.

Answer:

This was very much a trial and error process, and even then I know that there must be many better ideas than the ones I settled on. I am not yet familiar enough with the domain to know exactly what I should do...

Firstly I set up fairly deep network based on the initial layers of resnet18, but I found it was not progressing at all well, so I went back to basics and tried the following configurations, on the basis that increasing the CNN complexity and regularization should be beneficial up to a point, after which training the network requires more resources and knowledge than I can apply, and there is no further benefit.

Configurations assessed:

  • 3 layers: 32, 64, 96 node configuration
  • 3 layers: 32, 64, 96 node configuration and batch normalization
  • 3 layers: 48, 96, 128 node configuration
  • 4 layers: 32, 64, 96, 128 node configuration
  • 6 layers: 32 & 32, 64 & 64, 96 & 96 node configuration and batch normalization,
    • i.e. 3 layers each with 2 x Conv2d per layer

Summary of Results:
The best performing network for accuracy was the 48, 96, 128 node network, which I describe below, but some key points are that

  • The smaller configuration of 32, 64, 96 trained more quickly and was only slightly worse (27% vs 29% test accuracy after 300 epochs)
  • Batch normalization did not improve accuracy rather it smoothed the relationship between training and validation loss and in fact took longer to reach the same accuracy
  • The 4 layer configuration failed to make any useful progress, the 6 layer (3 x 2) configuration took a lot longer to train and did not perform as well - though with more training it might have kept performing.

Kernel Size:
Kernel size of the first layer was tested at 3, 5, and 7. A single kernel size of 3 did not perform as well as a single kernel of 5 in these configurations, but 7 was very difficult to train needing more resources than I had to make progess with it. Two size 3 kernels also demanded more resources than I had - though this may have been more to do with other aspects of that model's architecture.

Discussions 1, 2 regarding kernel size indicate that two stacked 3x3 kernels are a standard and a better choice than a single 5x5 kernel, so given time and more understanding I would like to revisit this and resolve the issues I had with this architecture.

Architecture of 48, 96, 128 CNN:

Net(
  (layer1): Sequential(
    (0): Conv2d(3, 48, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer2): Sequential(
    (0): Conv2d(48, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
  )
  (layer3): Sequential(
    (0): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=(1, 1))
      (mp): AdaptiveMaxPool2d(output_size=(1, 1))
    )
    (1): Flatten()
    (2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.25)
    (4): Linear(in_features=256, out_features=512, bias=True)
    (5): ReLU()
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.25)
    (8): Linear(in_features=512, out_features=133, bias=True)
  )
)

Explanation of architecture:

  • Reading about CNN architecture suggested that a 3 layer network would be a minimum, so that was my minimum configuration. After assessing variations including 3 layers of 2 convolutions and 4 layers of single convolutions I settled on a simple 3 layer model.
  • Kernel size in the first layer was set to 5 (though a kernel size of 3 and of 2x3 were also tested) as I wanted to capture enough features of the images and still have a trainable network within my recources.
  • With kernel size of 5 I used padding of 2 in the Conv2d layer, this is required to sample the image fully with this kernel size.
  • Subsequent Conv2d layers were given kernel size of 3 - this is a standard part of the image processing pipeline.
  • In keeping with the requirement of a neural network every convultion is followed by a linearity, in this case the ReLU activation function.
  • Each layer finishes with a max pooling using MaxPool2d, which is a standard step to down-sample the data as it moves through the network.
  • Although this network doesn't use it I evaluated using batch normalization on each layer, but as noted above did not usefully benefit from it, so ommitted it.
  • The classifier is described in detail above in the section "Preparation and notes for the CNN construction" - I have used a variation of AdaptiveAverage pooling that allows me to configure the classifier in terms of its outputs, for reasons of flexibilty in using different image sizes. This has downstream advantages of allowing me to train initially on smaller images and consequent larger batch sizes while developing an understanding of the network; and later in the prevention of overfitting by introducing gradual image size increases and so obtaining a greater accuracy.
  • Futhermore, by combining AdaptiveAvgPool2d with an AdaptiveMaxPool2d (wrapped in class AdaptiveConcatPool2d) the model can learn from maximum values as well as average values.
  • The flatten layer of the classifier makes the data digestible for the linear layers.
  • Batch normalization and dropout are added to assist in regularizing the work of the linear classifier layers, so preventing overfiitting.
  • I didn't wrap the output in a softmax as I intended to use the CrossEntropyLoss loss function.

Set use_weights to True if wanting to use the weights to compensate for class imbalance

In [84]:
use_weights = False

In [85]:
learning_rate = 1e-5

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_scratch, and the optimizer as optimizer_scratch below.

In [86]:
import torch.optim as optim

### TODO: select loss function
if use_weights:
    criterion_scratch = nn.CrossEntropyLoss(weight=class_weights.to(device), reduction='sum')
else:    
    criterion_scratch = nn.CrossEntropyLoss()

### TODO: select optimizer
optimizer_scratch = torch.optim.Adam(model_scratch.parameters(), lr=learning_rate)

Establish best learning rate

Code adapted from Sylvain Gugger's How Do You Find A Good Learning Rate

A note about using the learning rate finder

Testing the model with the learning rate finder will leave it with weights already set, so I have found that after finding the learning rate I need to re-create the model to start with a clean slate.

In [87]:
import math

def find_lr(model, optimizer, criterion, dataloaders, init_value = 1e-8, 
            final_value=10., beta = 0.98):
    num = len(dataloaders['train'])-1
    mult = (final_value / init_value) ** (1/num)
    lr = init_value
    optimizer.param_groups[0]['lr'] = lr
    avg_loss = 0.
    best_loss = 0.
    batch_num = 0
    losses = []
    log_lrs = []
    
    for images, labels in dataloaders['train']:
        batch_num += 1
        
        #Get the loss for this mini-batch of images/outputs
        images = images.to(device)
        labels = labels.to(device)
        
        optimizer.zero_grad()
        output = model.forward(images)
        
        loss = criterion(output, labels)
        
        #Compute the smoothed loss
        avg_loss = beta * avg_loss + (1-beta) * loss.item()
        smoothed_loss = avg_loss / (1 - beta**batch_num)
        
        #Stop if the loss is exploding
        if batch_num > 1 and smoothed_loss > 4 * best_loss:
            return log_lrs, losses
        
        #Record the best loss
        if smoothed_loss < best_loss or batch_num==1:
            best_loss = smoothed_loss
        
        #Store the values
        losses.append(smoothed_loss)
        log_lrs.append(math.log10(lr))
        
        #Do the SGD step
        loss.backward()
        optimizer.step()
        
        #Update the lr for the next step
        lr *= mult
        optimizer.param_groups[0]['lr'] = lr
    
    return log_lrs, losses
    
In [88]:
with active_session():
    logs, losses = find_lr(model_scratch, optimizer_scratch, criterion_scratch, loaders_scratch)
In [89]:
plt.rcParams['figure.figsize'] = [9.5, 6]

plt.plot(logs[10:-5],losses[10:-5])
Out[89]:
[<matplotlib.lines.Line2D at 0x7fba05196e48>]

The learning rate finder is a little hard to interpret, but probably 1e-5 is a good choice...

In [90]:
print("{:.10f}".format(1e-05))
print("{:.10f}".format(1e-06))
print("{:.10f}".format(1e-07))
0.0000100000
0.0000010000
0.0000001000
In [91]:
learning_rate = 1e-05

Resetting optimzer to use learning rate calculated by the finder...

In [92]:
optimizer_scratch = torch.optim.Adam(model_scratch.parameters(), lr=learning_rate)
In [93]:
get_learning_rate(optimizer_scratch)
Out[93]:
[1e-05]

Run next cell before running def train below, and to clear variables for a new run...

In [94]:
train_losses, valid_losses, train_acc_history, val_acc_history, lr_hist = [], [], [], [], [] 
best_acc, best_val_epoch, best_acc_epoch, epoch_loss_min = 0.0, 0, 0, np.Inf

(IMPLEMENTATION) Train and Validate the Model

Train and validate your model in the code cell below. Save the final model parameters at filepath 'model_scratch.pt'.

In [95]:
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path, 
          use_weights=False):
    """Returns trained model"""

    global train_losses, valid_losses, val_acc_history, lr_hist
    global best_acc, best_val_epoch, best_acc_epoch, epoch_loss_min
    
    curr_lr = get_learning_rate(optimizer)
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_optim_wts = copy.deepcopy(optimizer.state_dict())
    
    since = time.time()

    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    # also track maximum accuracy...
    valid_acc_max  = 0
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        valid_acc  = 0.0
        
        ###################
        # train the model #
        ###################
        model.train()
        
        running_loss = 0.0
        running_corrects = 0        

        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
                
            ## find the loss and update the model parameters accordingly
            optimizer.zero_grad()
            
            outputs = model(data)
            
            loss = criterion(outputs, target)
            
            _, preds = torch.max(outputs, 1)
            
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item() * (1 if use_weights==True else data.size(0))
            running_corrects += torch.sum(preds == target.data)
                
            ## Suggested approach from the project workbook...
            ## record the average training loss, using something like
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            
        epoch_loss = running_loss / dataset_sizes['train']
        epoch_acc = running_corrects.double() / dataset_sizes['train']

        train_losses.append(epoch_loss)
        train_acc_history.append(epoch_acc)
        
        curr_lr = get_learning_rate(optimizer)
        lr_hist.append(curr_lr)        

        ######################    
        # validate the model #
        ######################
        model.eval()

        running_loss = 0.0
        running_corrects = 0
            
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            
            ## update the average validation loss
            val_outputs = model(data)
            
            val_loss = criterion(val_outputs, target)

            _, preds = torch.max(val_outputs, 1)
            
            running_loss += val_loss.item() * (1 if use_weights==True else data.size(0))
            running_corrects += torch.sum(preds == target.data)
            
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (val_loss.data - valid_loss))
  
        epoch_val_loss = running_loss / dataset_sizes['valid']
        epoch_val_acc = running_corrects.double() / dataset_sizes['valid']
        
        valid_losses.append(epoch_val_loss)
        val_acc_history.append(epoch_val_acc.item()) 
        
        # print training/validation statistics 
# Tested and found that these calculations render identical results, so keeping my approach...        
#         print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
#             epoch, 
#             train_loss,
#             valid_loss
#             ))
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            epoch_loss,
            epoch_val_loss
            ))        
        print('Epoch: {} \tTraining Accuracy: {:.6f} \tValidation Accuracy: {:.6f}'.format(
            epoch, 
            epoch_acc,
            epoch_val_acc
            ))  
        
        ## TODO: save the model if validation loss has decreased
        if epoch_val_acc > best_acc:
            filename = save_path + '_acc' # + "_" + str(epoch)
            print('Accuracy has increased ({:.6f} --> {:.6f})  Saving model as {}...'.format(
                best_acc, epoch_val_acc, filename))            

            best_acc_epoch = epoch 
            best_acc = epoch_val_acc
            
            best_model_wts = copy.deepcopy(model.state_dict())
            best_optim_wts = copy.deepcopy(optimizer.state_dict())
            
            torch.save(best_model_wts, filename + '.pt')
            torch.save(best_optim_wts, filename + '_optimizer.pt') 
                
        if epoch_val_loss <= epoch_loss_min:
            filename = save_path + '_val' # + "_" + str(epoch)
            print('Validation loss decreased ({:.6f} --> {:.6f})  Saving model as {}...'.format(
            epoch_loss_min, epoch_val_loss, filename))
                
            best_val_epoch = epoch           
            epoch_loss_min = epoch_val_loss
            
            torch.save(model.state_dict(), filename + '.pt')
            torch.save(optimizer.state_dict(), filename + '_optimizer.pt')            
        
        print()
        
    time_elapsed = time.time() - since
    
    print()
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best validation accuracy: {:4f}'.format(best_acc))
    print('Best accuracy epoch     : {}'.format(best_acc_epoch))
    print('Best validation loss    : {:4f}'.format(epoch_loss_min))
    print('Best validation epoch   : {}'.format(best_val_epoch))
    
    # load best accuracy model weights
    model.load_state_dict(best_model_wts)
    optimizer.load_state_dict(best_optim_wts)
    
# Not automatically overwriting model_scratch state_dict as this may not be the best model...
#     # Save to required model_scratch
#     torch.save(model.state_dict(), save_path + '.pt')
#     torch.save(optimizer.state_dict(), save_path + '_optimizer.pt') 

    # return trained model
    return model

Enable as code cell and run if wanting to reset global variables

In [ ]:
train_losses, valid_losses, train_acc_history, val_acc_history, lr_hist = [], [], [], [], [] 
best_acc, best_val_epoch, best_acc_epoch, epoch_loss_min = 0.0, 0, 0, np.Inf

Training cell below

  • Model finishes with best accuracy state dict loaded
  • To save the model's state dict as model_scratch.pt use the following cell
  • To reload model_scratch.pt use the separate cell below
  • See charts of 490 epochs training, to get to 37% test accuracy
  • See Appendix for output of the first 300 epochs of the 48-96-128 model
In [96]:
with active_session():
    # train the model
    model_scratch = train(300, loaders_scratch, model_scratch, optimizer_scratch, 
                          criterion_scratch, use_cuda, 'model_scratch')

# load the model that got the best validation accuracy
#model_scratch.load_state_dict(torch.load('model_scratch.pt'))

Run to overwrite required model_scratch state dict file with this model's state dict

In [97]:
# Save to required model_scratch
torch.save(model_scratch.state_dict(), 'model_scratch.pt')
torch.save(optimizer_scratch.state_dict(), 'model_scratch_optimizer.pt') 

Run to load model_scratch state dict if needed...

In [98]:
model_scratch.load_state_dict(torch.load('model_scratch.pt'))
optimizer_scratch.load_state_dict(torch.load('model_scratch_optimizer.pt'))

Chart training losses and accuracy

In [99]:
plt.rcParams['figure.figsize'] = [9.5, 6]

plt.plot(train_losses[2:], label="Training loss")
plt.plot(valid_losses[2:], label="Validation loss")
plt.legend(frameon=False)
plt.show()
In [100]:
plt.rcParams['figure.figsize'] = [9.5, 6]

plt.plot(val_acc_history[2:], label="Validation accuracy")
plt.legend(frameon=False)
plt.show()

Reload previous best model if required...

Using reloadModel to restore model and optimizer from best accuracy state_dicts, and also restore tracking values...

NOTE: When loading a state dict saved under Pytorch 1.0 in a Pytorch 0.4.0 environment an error occurs re unexpected keys:

RuntimeError: Error(s) in loading state_dict for Net:
    Unexpected key(s) in state_dict: "layer1.4.num_batches_tracked", "layer2.4.num_batches_tracked", "layer3.4.num_batches_tracked", "classifier.2.num_batches_tracked", "classifier.6.num_batches_tracked".

See Loading part of a pre-trained model, so when reloading a state dict prepared in v 1.0 load it in stages, executing the code in the cell below instead of just model.load_state_dict()...

In [ ]:
model_dict = model_scratch.state_dict()
# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in best_model_wts.items() if k in model_dict}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict) 
# 3. load the new state dict
model_scratch.load_state_dict(pretrained_dict)
model_scratch.eval();
In [ ]:
!dir
In [102]:
state_dicts_name = 'model_d_502.pth'
learning_rate=1e-05
reloadModel(model_scratch, optimizer_scratch, state_dicts_name, learning_rate, isV1=True)

...or, just reloading models and optimizer from either best accuracy or best validation state_dicts

Accuracy...

In [103]:
best_model_wts = torch.load("model_scratch_acc.pt", 
                            map_location=lambda storage, loc: storage)
best_optim_wts = torch.load("model_scratch_acc_optimizer.pt", 
                            map_location=lambda storage, loc: storage)

or Loss...

In [104]:
best_model_wts = torch.load("model_scratch_val.pt", 
                            map_location=lambda storage, loc: storage)
best_optim_wts = torch.load("model_scratch_val_optimizer.pt",
                            map_location=lambda storage, loc: storage)

Load state dicts into optimizer and model...

In [105]:
optimizer_scratch.load_state_dict(best_optim_wts)
In [106]:
model_scratch.load_state_dict(best_model_wts)
model_scratch.eval();

or, loading a v1.0 state dict...

In [107]:
model_dict = model_scratch.state_dict()
pretrained_dict = {k: v for k, v in best_model_wts.items() if k in model_dict}
model_dict.update(pretrained_dict) 
model_scratch.load_state_dict(pretrained_dict)
model_scratch.eval();

Save model state dict to specified name model_scratch.pt if needed

In [ ]:
best_model_wts = copy.deepcopy(model_scratch.state_dict())
best_optim_wts = copy.deepcopy(optimizer_scratch.state_dict())

torch.save(best_model_wts, 'model_scratch.pt')
torch.save(best_optim_wts, 'model_scratch_optimizer.pt') 

Save checkpoint dictionary if needed

In [ ]:
checkpt = 'model_d_502.pth'
torch.save({'model_statedict':model_scratch.state_dict(),
            'optimizer_statedict':optimizer_scratch.state_dict(), 
            'best_acc_epoch' : 402 + 88,
            'best_val_epoch' : 402 + 99,
            'train_losses' : train_losses, 
            'valid_losses' : valid_losses, 
            'val_acc_history' : val_acc_history,
            'train_acc_history' : train_acc_history}, 
           checkpt)

(IMPLEMENTATION) Test the Model

Try out your model on the test dataset of dog images. Use the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 10%.

In [108]:
def test(loaders, model, criterion, use_cuda):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))
In [109]:
# call test function    
%time test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)
Test Loss: 2.503274


Test Accuracy: 37% (316/836)
CPU times: user 16.6 s, sys: 3.05 s, total: 19.7 s
Wall time: 17.9 s

Results of 3 layer CNN

48-96-128 configuration

(15% accuracy at 103 epochs, 29% (243/836) after 300 epochs), 37% (316/836) after 490 epochs

Model was run to 285 epochs with 224 size images, then switched to 299 size until epoch 400, then back to 224 to 490 epochs...

  • Image size change had a positive effect on the accuracy, which was stalling somewhat before each change
  • My GPU has limited memory, limiting batch size; this could have worked in fewer epochs had I been able to increase the batch size

1. Test results

Test Loss: 2.503274
Test Accuracy: 37% (316/836)

2. Run results

Best validation accuracy: 0.382036
Best accuracy epoch     : 490
Best validation loss    : 2.434461
Best validation epoch   : 499

3. Loss chart and Accuracy charts

Results of 3 layer CNN

32-64-96 configuration

(15% accuracy at 105 epochs, 27% (228/836) after 300 epochs)

1. Test results

Test Loss: 3.024790
Test Accuracy: 27% (228/836)

2. Run results

Run 1

Training complete in 141m 46s
Best validation accuracy: 0.146108
Best accuracy epoch     : 99
Best validation loss    : 3.840200
Best validation epoch   : 100

Run 2

Training complete in 136m 32s
Best validation accuracy: 0.231138
Best accuracy epoch     : 199
Best validation loss    : 3.267654
Best validation epoch   : 199

Run 3

Training complete in 140m 10s
Best validation accuracy: 0.285030
Best accuracy epoch     : 297
Best validation loss    : 2.968024
Best validation epoch   : 300

3. Loss chart and Accuracy charts

Results of 3 layer CNN with batch normalization

(15% accuracy at 105 epochs, 26% (225/836) after 300 epochs)

1. Test Results

Test Loss: 3.085338
Test Accuracy: 26% (225/836)

2. Run results

Training complete in 442m 23s
Best validation accuracy: 0.267066
Best accuracy epoch     : 291
Best validation loss    : 2.988656
Best validation epoch   : 299

3. Loss chart and Accuracy charts

Results of 6 layer CNN

(15% accuracy at 150 epochs)

6 layer CNN training summary

  1. Training 100 epochs in the Udacity workspace took around 9.1 hours, on a PC with a GTX1080 the task took 2.2 hours!
  2. Initialized the model trained here with state dicts obtained from the PC after 50 epochs
  3. Therefore results here are after 150 epochs, with validation accuracy at 16.5%
    Training complete in 548m 1s
    Best validation accuracy: 0.165269
    Best accuracy epoch     : 98  (actually 148)
    Best validation loss    : 3.677971
    Best validation epoch   : 100 (actually 150)
  4. After 150 epochs the 6 layer CNN gets 15% test accuracy (132/836)
    Test Loss: 3.738435
    Test Accuracy: 15% (132/836)

Step 4: Create a CNN to Classify Dog Breeds (using Transfer Learning)

You will now use transfer learning to create a CNN that can identify dog breed from images. Your CNN must attain at least 60% accuracy on the test set.

(IMPLEMENTATION) Specify Data Loaders for the Dog Dataset

Use the code cell below to write three separate data loaders for the training, validation, and test datasets of dog images (located at dogImages/train, dogImages/valid, and dogImages/test, respectively).

If you like, you are welcome to use the same data loaders from the previous step, when you created a CNN from scratch.

In [110]:
## TODO: Specify data loaders
import os
from PIL import Image
from PIL import ImageFile
from torchvision import datasets, transforms
ImageFile.LOAD_TRUNCATED_IMAGES = True

# data_dir  = work_dir + 'dog_images/'
# train_dir = data_dir + 'train/'
# valid_dir = data_dir + 'valid/'
# test_dir  = data_dir + 'test/'

data_dir  = orig_dir + 'dog_images/'
train_dir = data_dir + 'train/'
valid_dir = data_dir + 'valid/'
test_dir  = data_dir + 'test/'

batch_size = 128 
num_workers = 0 ## only 0 works in workspace...
image_size = 224 

global img_means, img_std

if img_means is not None:
    imgmeans = img_means
else: 
    imgmeans = [0.485, 0.456, 0.406]

if img_std is not None:
    imgstd = img_std
else: 
    imgstd = [0.229, 0.224, 0.225]
        
data_transforms = {
    'train': transforms.Compose([transforms.RandomAffine(15, translate=(0.1, 0.1), scale=(1.0, 1.5), 
                                                         shear=None, resample=Image.BILINEAR, 
                                                         fillcolor=0),
                                 transforms.Resize(image_size + (image_size//7), 
                                                   interpolation=Image.BILINEAR),
                                 transforms.CenterCrop(image_size),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ColorJitter(brightness=0.3, contrast=0.3, 
                                                        saturation=0.2, hue=0.05), 
                                 transforms.ToTensor(),
                                 transforms.Normalize(imgmeans, imgstd)
                                ]), 
    'valid': transforms.Compose([transforms.Resize(image_size + (image_size//7)),
                                 transforms.CenterCrop(image_size),
                                 transforms.ToTensor(),
                                transforms.Normalize(imgmeans, imgstd)
                                ]),
    'test' : transforms.Compose([transforms.Resize(image_size + (image_size//7)),
                                 transforms.CenterCrop(image_size),
                                 transforms.ToTensor(),
                                 transforms.Normalize(imgmeans, imgstd)
                                ])
}    

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'valid', 'test']}

class_names = image_datasets['train'].classes
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'valid', 'test']}

loaders_transfer = {x: torch.utils.data.DataLoader(image_datasets[x], 
                                              batch_size=batch_size,
                                              shuffle=False if x[:4]=='test' else True, 
                                              num_workers=num_workers)
              for x in ['train', 'valid', 'test']}

Use ImageFolderWithPaths wrapper as alternative datasets and loaders

so we can get paths if needed

Seems to only work with num_workers = 0

In [111]:
class ImageFolderWithPaths(datasets.ImageFolder):
    """Custom dataset that includes image file paths. Extends
    torchvision.datasets.ImageFolder
    """

    # override the __getitem__ method. this is the method dataloader calls
    def __getitem__(self, index):
        # this is what ImageFolder normally returns 
        original_tuple = super(ImageFolderWithPaths, self).__getitem__(index)
        # the image file path
        path = self.imgs[index][0]
        # make a new tuple that includes original and the path
        tuple_with_path = (original_tuple + (path,))
        return tuple_with_path
In [112]:
test_datasets = {x: ImageFolderWithPaths(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['valid', 'test']}

test_dataloaders = {x: torch.utils.data.DataLoader(test_datasets[x], 
                                              batch_size=batch_size,
                                              shuffle=False)
              for x in ['valid', 'test']}
In [ ]:
del model_transfer

(IMPLEMENTATION) Model Architecture

Use transfer learning to create a CNN to classify dog breed. Use the code cell below, and save your initialized model as the variable model_transfer.

In [113]:
import torchvision.models as models
import torch.nn as nn

## TODO: Specify model architecture 
class AdaptiveConcatPool2d(nn.Module):
    def __init__(self, sz=None):
        super().__init__()
        sz = sz or (1,1)
        self.ap = nn.AdaptiveAvgPool2d(sz)
        self.mp = nn.AdaptiveMaxPool2d(sz)
    def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)

class Flatten(nn.Module):
    def __init__(self):
        super(Flatten, self).__init__()

    def forward(self, x):
        x = x.view(x.size(0), -1)
        return x
    
def get_resnet_model(modelname='resnet50', swap=False, sm=False):
    if modelname == 'resnet18':
        model = models.resnet18(pretrained=True)   
    elif modelname == 'resnet34':    
        model = models.resnet34(pretrained=True)   
    elif modelname == 'resnet50':    
        model = models.resnet50(pretrained=True)
    elif modelname == 'resnet101':    
        model = models.resnet101(pretrained=True)
    elif modelname == 'resnet152':    
        model = models.resnet152(pretrained=True)
    else:
        model = models.resnet50(pretrained=True)
        
    for param in model.parameters():
        param.requires_grad = False    
        
    clf_input_size = model.fc.in_features
    clf_output_size = 133 # len(class_names)
    nf = clf_input_size * 2 # For flattening ...
    
    if swap==True:
        # swapping last 2 layers for new ones - adaptive max pooling and 
        # more linear layers with dropout
        Resnetlayers = []
        Resnetlayers.append(AdaptiveConcatPool2d())
        Resnetlayers.append(Flatten())
        Resnetlayers.append(nn.BatchNorm1d(nf, eps=1e-05, momentum=0.1, affine=True, 
                                           track_running_stats=True))
        Resnetlayers.append(nn.Dropout(p=0.25))
        Resnetlayers.append(nn.Linear(in_features=nf, out_features=512, bias=True))
        Resnetlayers.append(nn.ReLU())
        Resnetlayers.append(nn.BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, 
                                           track_running_stats=True))
        Resnetlayers.append(nn.Dropout(p=0.25))
        Resnetlayers.append(nn.Linear(in_features=512, out_features=clf_output_size, 
                                      bias=True))        
        if sm==True:
            Resnetlayers.append(nn.LogSoftmax())
            
        Resnetlist = list(model.children())[:-2] + Resnetlayers
        model = torch.nn.Sequential(*Resnetlist)
    else:    
        # standard replacing the fc layer to suit the number of classes
        classifier = nn.Linear(clf_input_size, clf_output_size)
        model.fc = classifier

    return model

model_name = "resnet152"

model_transfer = get_resnet_model(model_name, swap=False, sm=False)

if use_cuda:
    model_transfer = model_transfer.cuda()
Downloading: "https://download.pytorch.org/models/resnet152-b121ed2d.pth" to /root/.torch/models/resnet152-b121ed2d.pth
100%|██████████| 241530880/241530880 [00:04<00:00, 59502451.84it/s]
In [ ]:
print(model_transfer)

Question 5: Outline the steps you took to get to your final CNN architecture and your reasoning at each step. Describe why you think the architecture is suitable for the current problem.

Answer:

  • A resnet50 pre-trained ImageNet model was chosen:
    • an ImageNet model was an obvious choice for transfer learning
    • the resnet architecture was picked because of previous familiarity with it and obtaining pretty good results
    • the model is deep enough to obtain good results yet is not difficult to work with
    • the state dictionaries have a reasonable size whereas the VGG model state dicts are much larger
  • I treated this as an example of a "new dataset that is small and similar to original dataset", given that ImageNet has 118 dog classes in it and they overlap these 133
  • Which meant that I simply replaced the single classifier layer with another single layer, but with 133 outputs
  • And froze all but the classifier layer so training was just on the classifier
  • There is a lot of conflicting information out there about the use of these terms, but I believe the approach I took can be characterized as a "fine-tuning" rather than a "feature-extraction" one. However, the only practical difference that made was that I didn't bother with unfreezing any of the earlier layers for further training, though I suspect that if I did I could have exceeded the 90% accuracy achieved
  • I used a function get_resnet_model to replace the classifier, it also has the ability to replace it with an architecture of average pooling and 2 linear layers, which is the same as I used in the model designed from scratch. I tested this configuration and report on the results below.

Set use_weights to True if wanting to use the weights to compensate for class imbalance

In [ ]:
from collections import defaultdict
class_counts = defaultdict(int)
for _, c in image_datasets["train"].imgs:
    class_counts[c] += 1

class_weights = [1-(float(class_counts[class_id])/len(image_datasets["train"].imgs))
                 for class_id in range(len(image_datasets["train"].classes))]
class_weights = torch.FloatTensor(class_weights)
class_weights.to(device)
In [116]:
use_weights = False
In [117]:
learning_rate = 0.05

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_transfer, and the optimizer as optimizer_transfer below.

In [118]:
import torch.optim as optim

if use_weights:
    criterion_transfer = nn.CrossEntropyLoss(weight=class_weights.to(device), reduction='sum')
else:    
    criterion_transfer = nn.CrossEntropyLoss()

parameters = filter(lambda p: p.requires_grad, model_transfer.parameters())

optimizer_transfer = optim.Adam(parameters, lr=learning_rate)

Scheduler

In [119]:
from torch.optim import lr_scheduler

multistep = False
OnPlateau = False

# Decay LR by a factor of 0.1 every 14 epochs (9 for inception)
if OnPlateau == True:
    #exp_lr_scheduler = lr_scheduler.ReduceLROnPlateau(optimizer_transfer, mode='max', 
    #                                                  factor=0.5, patience=5, min_lr=0.000001)
    exp_lr_scheduler = lr_scheduler.ReduceLROnPlateau(optimizer_transfer, mode='max', 
                                                      factor=0.5, patience=5)
elif multistep==True:
    exp_lr_scheduler = lr_scheduler.MultiStepLR(optimizer_transfer, [7, 13, 23], 
                                                gamma=0.05) # gamma=0.1
else:
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_transfer, 10, gamma=0.1)
    #exp_lr_scheduler = lr_scheduler.StepLR(optimizer_transfer, 10, gamma=0.05)
    # exp_lr_scheduler = lr_scheduler.StepLR(optimizer_transfer, 5, gamma=0.1)
In [120]:
exp_lr_scheduler
Out[120]:
<torch.optim.lr_scheduler.StepLR at 0x7f46e83b4240>

Determine learning rate

In [121]:
with active_session():
    logs, losses = find_lr(model_transfer, optimizer_transfer, criterion_transfer, loaders_transfer)
In [122]:
plt.rcParams['figure.figsize'] = [9.5, 6]
plt.plot(logs[10:-7],losses[10:-7])
Out[122]:
[<matplotlib.lines.Line2D at 0x7f777c8e8940>]

Training loop

In [123]:
train_losses, valid_losses, train_acc_history, val_acc_history, lr_hist = [], [], [], [], [] 
best_acc, best_val_epoch, best_acc_epoch, epoch_loss_min = 0.0, 0, 0, np.Inf
In [124]:
def train_model(model, criterion, optimizer, dataloaders, scheduler, use_scheduler=False, 
                use_weights=False, num_epochs=50, first_epoch=0, ufilename=None, 
                is_inception=False):
    
    global train_losses, valid_losses, val_acc_history, lr_hist
    global best_acc, best_val_epoch, best_acc_epoch, epoch_loss_min
    global model_name
    
    since = time.time()
    best_model_wts = copy.deepcopy(model.state_dict())
    best_optim_wts = copy.deepcopy(optimizer.state_dict())

    curr_lr = get_learning_rate(optimizer)
    
    # if first_epoch is used assume its to restart training at that point using 1-based indexing
    # so decrement by 1 then add the value to num_epochs
    if first_epoch > 0:
        first_epoch -= 1
        num_epochs = first_epoch + num_epochs
    
    for epoch in range(first_epoch, num_epochs):
        print('Epoch {}/{}'.format(epoch+1, num_epochs))
        print('-' * 11)

        # Each epoch has a training and validation phase
        for phase in ['train', 'valid']:
            if phase == 'train':
                if use_scheduler==True: 
                    if isinstance(scheduler, torch.optim.lr_scheduler.ReduceLROnPlateau):
                        # using best accuracy - the max setting
                        scheduler.step(best_acc)
                    else:    
                        scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs, labels = inputs.to(device), labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
# ref https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                    # Special case for inception because in training it has an auxiliary output. 
                    # In train mode we calculate the loss by summing the final output and 
                    # the auxiliary output
                    # but in testing we only consider the final output.
                    
                    outputs = model(inputs)
                    
                    if isinstance(outputs, tuple):
                        if epoch == first_epoch:
                            print("Outputs:", outputs)
                        
                        if is_inception and phase == 'train':
                            loss1 = criterion(outputs[0], labels)
                            loss2 = criterion(outputs[1], labels)
                            loss = loss1 + 0.4*loss2
                        else:    
                            loss = sum((criterion(o,labels) for o in outputs))
                    else:
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)
                    
                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * (1 if use_weights==True else inputs.size(0))
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            if phase == 'train':
                train_losses.append(epoch_loss)
                curr_lr = get_learning_rate(optimizer)
                lr_hist.append(curr_lr)
                
                print("Learning rate: {}".format(curr_lr))
                
            if phase == 'valid':
                valid_losses.append(epoch_loss)
                val_acc_history.append(epoch_acc.item())
                                
            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'valid' and epoch_acc > best_acc:
                filename = model_name + '_acc_'
                if ufilename is not None:
                    filename = filename + ufilename + '_'
                else:    
                    if use_scheduler==True:
                        filename = filename + 'schd_'
                    if use_weights==True:
                        filename = filename + 'wgts_'
                ### filename = filename + str(epoch+1) 
                if filename[-1:] == "_":
                    filename = filename[:-1]
                    
                print(
                    'Accuracy has increased ({:.6f} --> {:.6f}).  Saving model as {}...'.format(
                    best_acc, epoch_acc,
                    filename))

                best_acc_epoch = epoch+1
                best_acc = epoch_acc
                
                best_model_wts = copy.deepcopy(model.state_dict())
                best_optim_wts = copy.deepcopy(optimizer.state_dict())
                torch.save(best_model_wts, filename + '.pt')
                torch.save(best_optim_wts, filename + '_optimizer.pt')               
                
            if phase == 'valid' and epoch_loss <= epoch_loss_min:
                filename = model_name + '_'
                if ufilename is not None:
                    filename = filename + ufilename + '_'
                else:    
                    if use_scheduler==True:
                        filename = filename + 'schd_'
                    if use_weights==True:
                        filename = filename + 'wgts_'
                ### filename = filename + str(epoch+1) 
                if filename[-1:] == "_":
                    filename = filename[:-1]
                
                print(
                    'Validation loss decreased ({:.6f} --> {:.6f}).  Saving model as {}...'.format(
                epoch_loss_min,
                epoch_loss,
                filename))
                
                best_val_epoch = epoch+1
                epoch_loss_min = epoch_loss
                
                torch.save(model.state_dict(), filename + '.pt')
                torch.save(optimizer.state_dict(), filename + '_optimizer.pt')

        print()

    time_elapsed = time.time() - since
    
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best validation Accuracy: {:4f}'.format(best_acc))
    print('Best accuracy epoch     : {}'.format(best_acc_epoch))
    print('Best validation Loss    : {:4f}'.format(epoch_loss_min))
    print('Best validation epoch   : {}'.format(best_val_epoch))
    
    # load best model weights
    model.load_state_dict(best_model_wts)
    optimizer.load_state_dict(best_optim_wts)

# Not automatically overwriting model_transfer state_dict as this may not be the best model...
#     # Save to required model_scratch
#     torch.save(model.state_dict(), 'model_transfer.pt')
#     torch.save(optimizer.state_dict(), 'model_transfer_optimizer.pt') 

    return model

(IMPLEMENTATION) Train and Validate the Model

Train and validate your model in the code cell below. Save the final model parameters at filepath 'model_transfer.pt'.

The training loop - "1 cycle" approach:

In [125]:
# experiment with training loop inspired by the 1 cycle approach, 
# starting with a low rate, up to very high, then back again...
step = 10
start_epoch = 1
end_epoch = start_epoch + step

start = 1e-05
middle = 1e-03
end = 1e-04

lr = start
start_epoch = 1

with active_session():
    # train the model
    
    while lr < middle:
        print(lr)
        parameters = filter(lambda p: p.requires_grad, model_transfer.parameters())
        optimizer_transfer = optim.Adam(parameters, lr=lr)
        end_epoch = start_epoch + step - 1
        print(str(start_epoch), " --> ", str(end_epoch))
        print()

        model_transfer = train_model(model_transfer, 
                        criterion_transfer, 
                        optimizer_transfer,
                        loaders_transfer,         
                        exp_lr_scheduler, 
                        use_scheduler=False, 
                        use_weights=use_weights, 
                        num_epochs=step,
                        first_epoch=start_epoch,
                        ufilename="1cycle_1",
                        is_inception=(model_name=="inception"))

        print()
        print('=' * 20)
        print()

        start_epoch = end_epoch + 1

        lr *= 10
        lr = round(lr, 8)

        # set step to 15 for remaining up cycle
        step = 15

# break here if wanting to split training before the down cycle
# (if so, will need need its own active_session wrapper)
    # with active_session():
    step = 15    
    while lr >= end:
        print(lr)
        parameters = filter(lambda p: p.requires_grad, model_transfer.parameters())
        optimizer_transfer = optim.Adam(parameters, lr=lr)
        end_epoch = start_epoch + step - 1
        print(str(start_epoch), " --> ", str(end_epoch))
        print()

        model_transfer = train_model(model_transfer, 
                        criterion_transfer, 
                        optimizer_transfer,
                        loaders_transfer,         
                        exp_lr_scheduler, 
                        use_scheduler=False, 
                        use_weights=use_weights, 
                        num_epochs=step,
                        first_epoch=start_epoch,
                        ufilename="1cycle_1",
                        is_inception=(model_name=="inception"))

        print()
        print('=' * 20)
        print()

        start_epoch = end_epoch + 1

        lr *= 1/10
        lr = round(lr, 8)

Chart "1 cycle" training losses and accuracy

In [126]:
plt.rcParams['figure.figsize'] = [9.5, 6]

plt.plot(train_losses[2:], label="Training loss")
plt.plot(valid_losses[2:], label="Validation loss")
plt.legend(frameon=False)
plt.show()
In [127]:
plt.rcParams['figure.figsize'] = [9.5, 6]

plt.plot(val_acc_history[2:], label="Validation accuracy")
plt.legend(frameon=False)
plt.show()

"1 cycle" variation, based on 0.005 learning rate

  • this was a more natural rate as suggested by the learning rate finder
  • it converged more quickly, was smoother, but ultimately was fractionally less accurate (see analysis below)
In [128]:
step = 15
start_epoch = 1
end_epoch = start_epoch + step

start = 0.00005
middle = 0.00005
end = 0.000005

lr = start
start_epoch = 1

with active_session():    
    while lr <= middle:
        print(lr)
        parameters = filter(lambda p: p.requires_grad, model_transfer.parameters())
        optimizer_transfer = optim.Adam(parameters, lr=lr)
        end_epoch = start_epoch + step - 1
        print(str(start_epoch), " --> ", str(end_epoch))
        print()

        model_transfer = train_model(model_transfer, 
                        criterion_transfer, 
                        optimizer_transfer,
                        loaders_transfer,         
                        exp_lr_scheduler, 
                        use_scheduler=False, 
                        use_weights=use_weights, 
                        num_epochs=step,
                        first_epoch=start_epoch,
                        ufilename="1cycle_4",
                        is_inception=(model_name=="inception"))

        print()
        print('=' * 20)
        print()

        start_epoch = end_epoch + 1

        lr *= 10
        lr = round(lr, 8)


    # with active_session():
    # Step at 15 per epoch for remaining of training run
    step = 15    
    while lr >= end:
        print(lr)
        parameters = filter(lambda p: p.requires_grad, model_transfer.parameters())
        optimizer_transfer = optim.Adam(parameters, lr=lr)
        end_epoch = start_epoch + step - 1
        print(str(start_epoch), " --> ", str(end_epoch))
        print()

        model_transfer = train_model(model_transfer, 
                        criterion_transfer, 
                        optimizer_transfer,
                        loaders_transfer,         
                        exp_lr_scheduler, 
                        use_scheduler=False, 
                        use_weights=use_weights, 
                        num_epochs=step,
                        first_epoch=start_epoch,
                        ufilename="1cycle_4",
                        is_inception=(model_name=="inception"))

        print()
        print('=' * 20)
        print()

        start_epoch = end_epoch + 1

        lr *= 1/10
        lr = round(lr, 8)
In [129]:
plt.rcParams['figure.figsize'] = [9.5, 6]

plt.plot(train_losses, label="Training loss")
plt.plot(valid_losses, label="Validation loss")
plt.legend(frameon=False)
plt.show()
In [130]:
plt.rcParams['figure.figsize'] = [9.5, 6]

plt.plot(val_acc_history, label="Validation accuracy")
plt.legend(frameon=False)
plt.show()

The classic approach

  • See comparitive results below
  • Uses learning rate scheduler
  • Can be used iteratively to add more training, by adjusting the num_epochs and first_epoch
  • Adjusting first_epoch to next projected epoch number will add to the current count
  • and to the arrays of validation loss and accuracy
In [ ]:
with active_session():
    # train the model
    model_transfer = train_model(model_transfer, 
                        criterion_transfer, 
                        optimizer_transfer,
                        loaders_transfer,         
                        exp_lr_scheduler, 
                        use_scheduler=True, 
                        use_weights=use_weights, 
                        num_epochs=50,
                        first_epoch=1,
                        ufilename="std",
                        is_inception=(model_name=="inception"))

# load the model that got the best validation accuracy (uncomment the line below)
 #model_transfer.load_state_dict(torch.load('model_transfer.pt'))

Chart classic training losses and accuracy

In [132]:
plt.rcParams['figure.figsize'] = [9.5, 6]

plt.plot(train_losses[2:], label="Training loss")
plt.plot(valid_losses[2:], label="Validation loss")
plt.legend(frameon=False)
plt.show()
In [133]:
plt.rcParams['figure.figsize'] = [9.5, 6]

plt.plot(val_acc_history[2:], label="Validation accuracy")
plt.legend(frameon=False)
plt.show()

Run to overwrite required model_transfer state dict file with this model's state dict

# Save to required model_scratch torch.save(model_transfer.state_dict(), 'model_transfer.pt') torch.save(optimizer_transfer.state_dict(), 'model_transfer_optimizer.pt')

Run to load model_transfer state dict if needed...

In [134]:
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
optimizer_transfer.load_state_dict(torch.load('model_transfer_optimizer.pt'))

Save checkpoint data if needed

In [ ]:
checkpt = 'model_1cycle_1.pth'
torch.save({'model_statedict':model_transfer.state_dict(),
            'optimizer_statedict':optimizer_transfer.state_dict(), 
            'best_acc_epoch' : 48,
            'best_val_epoch' : 48,
            'train_losses' : train_losses, 
            'valid_losses' : valid_losses, 
            'val_acc_history' : val_acc_history,
            'train_acc_history' : train_acc_history}, 
           checkpt)
In [ ]:
checkpt = 'model_1cycle_8.pth'
torch.save({'model_statedict':model_transfer.state_dict(),
            'optimizer_statedict':optimizer_transfer.state_dict(), 
            'best_acc_epoch' : 45,
            'best_val_epoch' : 57,
            'train_losses' : train_losses, 
            'valid_losses' : valid_losses, 
            'val_acc_history' : val_acc_history,
            'train_acc_history' : train_acc_history}, 
           checkpt)
In [ ]:
checkpt = 'model_standard.pth'
torch.save({'model_statedict':model_transfer.state_dict(),
            'optimizer_statedict':optimizer_transfer.state_dict(), 
            'best_acc_epoch' : 45,
            'best_val_epoch' : 45,
            'train_losses' : train_losses, 
            'valid_losses' : valid_losses, 
            'val_acc_history' : val_acc_history,
            'train_acc_history' : train_acc_history}, 
           checkpt)
In [ ]:
checkpt = 'model_standard_mk2.pth'
torch.save({'model_statedict':model_transfer.state_dict(),
            'optimizer_statedict':optimizer_transfer.state_dict(), 
            'best_acc_epoch' : 24,
            'best_val_epoch' : 41,
            'train_losses' : train_losses, 
            'valid_losses' : valid_losses, 
            'val_acc_history' : val_acc_history,
            'train_acc_history' : train_acc_history}, 
           checkpt)

Re-load checkpoint dictionary if needed

These checkpoint dictionaries contain the various arrays creatd during training, so can be used to recreate charts

  • model_1cycle_1 contains the same state_dict as is saved in model_transfer.pt
  • model_1cycle_8 contains the state_dict obtained from the model trained with the variation of the 1 cycle (above)
  • model_standard_mk2 containes the state_dict of a model processed with a standard training loop (also above)
In [135]:
state_dicts_name = 'model_1cycle_1.pth'
learning_rate=0.001
reloadModel(model_transfer, optimizer_transfer, state_dicts_name, learning_rate, isV1=True)
In [ ]:
state_dicts_name = 'model_1cycle_8.pth'
learning_rate=0.005
reloadModel(model_transfer, optimizer_transfer, state_dicts_name, learning_rate, isV1=True)
In [ ]:
state_dicts_name = 'model_standard_mk2.pth'
learning_rate=0.05
reloadModel(model_transfer, optimizer_transfer, state_dicts_name, learning_rate, isV1=False)

(IMPLEMENTATION) Test the Model

Try out your model on the test dataset of dog images. Use the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 60%.

Test best model

  • load from model_transfer.pt or model_1cycle_1.pth
In [136]:
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
Test Loss: 0.357339


Test Accuracy: 90% (753/836)

Test model trained with variation of 1cycle

  • load from model_1cycle_8.pth
In [137]:
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
Test Loss: 0.370314


Test Accuracy: 89% (751/836)

Test model trained with classic approach

  • load from model_standard_mk2.pth
In [138]:
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
Test Loss: 3.045314


Test Accuracy: 88% (740/836)

Analysis of transfer model variations

The best performing training cycle was using the "1cycle" approach with a very slow build up from a learning rate of 1e-07, until 1e-03, then back to 1e-04.

But this was compared with a number of other approaches, including an initial standard approach for comparison.

With the "1cycle" approach the first thing I tried was to go all the way from 1e-07 up to 1e-01 and back to 1e-08. This showed that there was no further useful activity above 1e-03, and on the way back down nothing useful happening beyond 1e-04.

I also assessed cycles around a 5e-03 peak as variations around this were more in line with the ideal learning rate.

The working training cycle:

  • 1e-07 --> 1e-03 --> 1e-04
  • Best accuracy at epoch 48
  • Trained for 55 epochs
  • The 90% accuracy was the best, so this model is used, but a possibly more predictable approach was used in variation 1
  • Reported test results:
    Test Loss: 0.357339
    Test Accuracy: 90% (753/836)


Extend learning for 20 additional epochs, going down to 1e-06:

  • 1e-07 --> 1e-03 --> 1e-06
  • Same model as the working model, just added the epochs
  • There was no further improvement beyond the original epoch 48


Variation of one cycle approach: use 0.05 learning rate as basis of adjustments

  • 5e-05 --> 5e-04 --> 5e-06
  • This was the culmination of a series of tests around using fractions of 0.05 as a learning rate
  • 0.05 was chosen based on the learning rate finder
  • The best approach was to start at 0.0005 and go up just to 0.005 before coming down again
  • An experiment with weight decay for the latter part of the training did not yield a benefit
  • The 2 layer classfier was tested and performed better on validation data but not as well on test data
  • Trained for 60 epochs
  • Best accuracy at epoch 45
  • Accuracy was 89.8%, so almost as good as the best training loop and more predictable and less effort
  • Reported test results:
    Test Loss: 0.370314
    Test Accuracy: 89% (751/836)


Standard exploratory run

  • Ran 30 cycles with a learning rate of 1e-05 to analyse model, data, and training losses


Standard approach: starting from 0.05 and decreasing lr:

  • Also a culmination of serveral tests...
  • Started with a learning rate of 0.05, based on learning rate finder
  • Reduced learning rate by a factor of 10 every 10 epochs
  • Best accuracy was at epoch 24, at 88.5% (740/836)
  • The training and validation losses were quite diverged
  • I evaluated weight decay of 1e-04 which brought the losses a little closer together but didn't affect the outcome
  • At this stage fine-tuning this was familiar territory so I decided instead to devise and test a "1 cycle" approach
  • By comparison the train and validation numbers are closely aligned with the 1 cycle approach, and accuracy increases by 1.5%
  • Reported test results:
    Test Loss: 3.045314            - 1 cycle was 0.357339
    Test Accuracy: 88% (740/836)   - 1 cycle was 90% (753/836)


Training conclusion

The "1 cycle" approach has worked fairly well, it is gained a 1.5% increase in accuracy and the maintenance of a much closer relationship between training and validation losses, when compared with a standard approach (and not truncating the accuracy calculation). There is more to investigate about it so that I implement it more consistently and according to the intended design, but even this hack at it was an eye-opener.


Set up standard ImageNet predictor to assist with assessing non-dog images

In [139]:
import torch
import torch.nn.functional as F
import torchvision.models as models
ImageNetDict = eval(open("imagenet1000_clsidx_to_labels.txt").read())

resnetImageNet = models.resnet152(pretrained=True)
resnetImageNet.class_to_name = ImageNetDict
resnetImageNet.eval();
In [140]:
def ImageNet_predict(img_path, topk=1):
    '''
    Use pre-trained resnet152 model to obtain index corresponding to 
    predicted ImageNet class for image at specified path
    
    Args:
        img_path: path to an image
        topk: number of predictions to return (allow for top 5 for instance)
        
    Returns:
        Index(s) corresponding to resnet152 model's prediction
        Probability of the prediction(s)
    '''
    
    img = img_process(img_path)
    img_tensor = torch.from_numpy(img).type(torch.FloatTensor)
    img_tensor.unsqueeze_(0)

    resnetImageNet.cpu()
    resnetImageNet.eval()
    
    with torch.no_grad():
        log_ps = F.softmax(resnetImageNet.forward(img_tensor), dim=1)
        probs, classes = torch.topk(log_ps, k=topk)

        probs = probs.view(topk).detach().numpy().tolist()
        classes = classes.view(topk).detach().numpy().tolist()
        classnames = [resnetImageNet.class_to_name[cls] for cls in classes]
        
    return classes, classnames, probs

def ImageNet_dog_detector(img_path, inclPrediction=False):
    prediction = ImageNet_predict(img_path)
    
    dog_detected = (prediction[0][0] in range(151, 269))
    
    if inclPrediction==True:
        return (dog_detected, prediction)
    else:    
        return dog_detected

(IMPLEMENTATION) Predict Dog Breed with the Model

Write a function that takes an image path as input and returns the dog breed (Affenpinscher, Afghan hound, etc) that is predicted by your model.

In [141]:
### TODO: Write a function that takes a path to an image as input
### and returns the dog breed that is predicted by the model.

# list of class names by index, i.e. a name can be accessed like class_names[0]
class_names = [item[4:].replace("_", " ") for item in image_datasets['train'].classes]
class_examples = eval(open("dog_examples.txt").read())

model_transfer.class_to_name = dict(zip(range(133), class_names))
model_transfer.class_to_example = class_examples

def predict_breed_transfer(img_path, topk=1):
    # load the image and return the predicted breed
    img = img_process(img_path)
    
    img_tensor = torch.from_numpy(img).type(torch.FloatTensor)
    img_tensor.unsqueeze_(0)

    model_transfer.cpu()
    model_transfer.eval()
    
    with torch.no_grad():
        log_ps = F.softmax(model_transfer.forward(img_tensor), dim=1)
        probs, classes = torch.topk(log_ps, k=topk)

        probs = probs.view(topk).detach().numpy().tolist()
        classes = classes.view(topk).detach().numpy().tolist()
        classnames = [model_transfer.class_to_name[cls] for cls in classes]
        classexamples = [model_transfer.class_to_example[cls] for cls in classes]
        
    return classnames, classes, probs, classexamples   
In [142]:
predict_breed_transfer('/data/dog_images/test/011.Australian_cattle_dog/Australian_cattle_dog_00728.jpg', 5)
Out[142]:
(['Australian cattle dog',
  'Canaan dog',
  'Finnish spitz',
  'Icelandic sheepdog',
  'Cardigan welsh corgi'],
 [10, 42, 66, 83, 44],
 [0.7505977153778076,
  0.07450136542320251,
  0.07055043429136276,
  0.061345312744379044,
  0.009592127054929733],
 ['/data/dog_images/train/011.Australian_cattle_dog/Australian_cattle_dog_00723.jpg',
  '/data/dog_images/train/043.Canaan_dog/Canaan_dog_03061.jpg',
  '/data/dog_images/train/067.Finnish_spitz/Finnish_spitz_04655.jpg',
  '/data/dog_images/train/084.Icelandic_sheepdog/Icelandic_sheepdog_05715.jpg',
  '/data/dog_images/train/045.Cardigan_welsh_corgi/Cardigan_welsh_corgi_03213.jpg'])

Step 5: Write your Algorithm

Write an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

  • if a dog is detected in the image, return the predicted breed.
  • if a human is detected in the image, return the resembling dog breed.
  • if neither is detected in the image, provide output that indicates an error.

You are welcome to write your own functions for detecting humans and dogs in images, but feel free to use the face_detector and human_detector functions developed above. You are required to use your CNN from Step 4 to predict dog breed.

Some sample output for our algorithm is provided below, but feel free to design your own user experience!

Sample Human Output

(IMPLEMENTATION) Write your Algorithm

In [143]:
### TODO: Write your algorithm.
### Feel free to use as many code cells as needed.
In [144]:
import ntpath
from PIL import Image, ImageFile, ImageFont, ImageDraw

def face_detector2(img_path):
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    faces_found = len(faces) > 0
    if faces_found:
        for (x,y,w,h) in faces:
            # add bounding box to color image
            cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2) 
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    return (faces_found, img)

def process_prediction(img_path, threshold=0.25):
    ## handle cases for a human face, dog, and neither
    isdog = False
    ishuman = False
    isunknown = False
    altClass = ""
    
    isFromDogData = ('dog_images/' in img_path.lower()) or ('dog_images\\' in img_path.lower())
    isFromHumanData = ('lfw/' in img_path.lower()) or ('lfw\\' in img_path.lower())

    dog_breed_data = predict_breed_transfer(img_path)
    imagenet_data = ImageNet_dog_detector(img_path, True)
    altClass = imagenet_data[1][0]
    altClassName = imagenet_data[1][1]
    
    prediction = dog_breed_data[2][0]
    
    # Signify a dog if the probabilty of a dog is above a threshold 
    isdog = (prediction > threshold)
        
    # If the probabilty of a dog falls below a threshold and a standard 
    # imagenet classifier returns non-dog, make it unknown
    if isdog == False:
        if imagenet_data[0] == False:
            isunknown = True
        
    face_data = face_detector2(img_path)
    
    if face_data[0] > 0 or altClass in [834]:
        # could be a human
        ishuman = True
        img = face_data[1]
    else:
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # ground truth image... square crop it
    # see http://corochann.com/basic-image-processing-tutorial-1220.html
    ground_truth_img = cv2.imread(dog_breed_data[3][0])
    ground_truth_img = cv2.cvtColor(ground_truth_img, cv2.COLOR_BGR2RGB)
    height, width = ground_truth_img.shape[:2]
    crop_length = min(height, width)
    height_start = (height - crop_length) // 2
    width_start = (width - crop_length) // 2
    ground_truth_img = ground_truth_img[
                    height_start:height_start+crop_length, 
                    width_start:width_start+crop_length,
                    :]

    return dog_breed_data, imagenet_data, [isdog, ishuman, isunknown, 
                                           isFromDogData, isFromHumanData], img, ground_truth_img

def run_app(img_path, threshold=0.35):
    img_name = ntpath.basename(ntpath.dirname(img_path)) + "/" + ntpath.basename(img_path)
    subject_name = ntpath.basename(img_path).replace("_", " ")
    color = "black"
    
    image_data = process_prediction(img_path, threshold)
    
    prediction = image_data[0][2]
    prediction = prediction[0]
    
    imagenetdog  = image_data[1][0]
    imagenetpred = image_data[1][1][0][0]
    imagenetname = image_data[1][1][1][0]
    
    # image_data[2] = [isdog, ishuman, isunknown, isFromDogData, isFromHuamData] 
    if image_data[2][0] == True and image_data[2][3] == True:
        message1 = "You are a dog for sure!"
        message2 = ":)"
    elif image_data[2][0] == True and image_data[2][1] == True:
        message1 = str(round(prediction * 100)) + "% like the guy on the right!" 
        message2 = "but, maybe you are people..."
    elif image_data[2][0] == False and image_data[2][1] == True:
        message1 = "Look like a human, yay!"
        message2 = "the lucky dog looks like you!"
    elif (prediction <= threshold) or (imagenetpred not in range(151, 269)):
        message1 = "Somthing doggy about you..."
        message2 = "not sure what kind of dog!"
        if imagenetdog == False:
            if imagenetpred in range(269,280): # wolves etc.
                message2 = "maybe a cousin of sorts?"
            elif imagenetpred in range(280,294): # cats etc.
                message2 = "some cat disagrees :)"
            elif imagenetpred in range(294,298): # bears etc.    
                message2 = "can you bear the thought?"
            elif imagenetpred in range(365,385): # primates
                message2 = "just monkeying around!"
            elif imagenetpred >= 398: # man-made stuff...
                message2 = "but that doesn't seem possible!"
            else:
                message2 = "but nobody is perfect!"
    elif image_data[2][0] == True and prediction > threshold:
        message1 = "Look like a dog for sure..."
        message2 = "!!!"
    elif image_data[2][0] == False:
        message1 = "What are you?"
        message2 = "??????"
    elif image_data[2][2] == True:
        message1 = "I know you're unknown..."
        message2 = "How can that be?"
    else:
        message1 = "I know nothing!"
        message2 = "really..."
        
    _, axes = plt.subplots(figsize=(20,6), ncols=3)

    for ii in range(3):
        ax = axes[ii]
        if ii == 0:
            img = image_data[3]
            title = subject_name
            filename =  img_name
        elif ii == 1:
            newimg = Image.new('RGB', (224, 224), (255, 255, 255))
            fnt = ImageFont.truetype('images/calibri.ttf', 18)
            d = ImageDraw.Draw(newimg)
            d.text((4, 20), message1, font=fnt, fill=(0, 0, 0))
            d.text((5, 50), message2, font=fnt, fill=(0, 0, 0))
            newimg.save('images/textmsg.jpg')
            img = cv2.imread('images/textmsg.jpg')
            title = ""
            filename = ""
        else:
            img = image_data[4]
            title = image_data[0][0]
            title = title[0]
            filename = ""

        ax.imshow(img)    
        ax.tick_params(axis='both', length=0)
        ax.set_xticklabels('')
        ax.set_yticklabels('')
        
        ax.set_title(title, color=color)
        ax.set_xlabel(filename)
        if ii == 1:
            ax.set_axis_off()
            
    if os.path.isfile('images/textmsg.jpg'):    
        os.remove('images/textmsg.jpg') 
In [145]:
run_app('/data/dog_images/test/011.Australian_cattle_dog/Australian_cattle_dog_00728.jpg')

Step 6: Test Your Algorithm

In this section, you will take your new algorithm for a spin! What kind of dog does the algorithm think that you look like? If you have a dog, does it predict your dog's breed accurately? If you have a cat, does it mistakenly think that your cat is a dog?

(IMPLEMENTATION) Test Your Algorithm on Sample Images!

Test your algorithm at least six images on your computer. Feel free to use any images you like. Use at least two human and two dog images.

Question 6: Is the output better than you expected :) ? Or worse :( ? Provide at least three possible points of improvement for your algorithm.

Answer:

The output is pretty much what I expected:

  • the accuracy of the dog breed classifier exceeds my expectations, with a 90% accuracy on test data.
  • the incoming image is shown on the left, an example image of the dog that predicted on the right.
  • I show the incoming name derived from path information on the incoming image title, together with the filename and parent path as an axis label.
  • I show the name of the predicted dog class as the title of the predicted dog image.
  • the dog breed classifier is always going to return probablities of a dog no matter what its given, but I use a probability threshold (default 35%) to decide whether or not the image is likely to be a dog or something else.
  • the face detector is far from perfect, but if it does detect a face then I consider the possiblility the image is of a human.
  • if both dog and human are detected but the dog probability is above the threshold then I assume its a dog and give the probability, but allude to the additional detection of a human. This IMO is more fun than just saying human as it emphasises that the model really thought it was seeing a dog!
  • if both dog and human are detected but the dog probability is below the threshold then I assume its a human but indicate that the dog looks like the human.
  • I use ancillary information - the predictions of a standard ImageNet resnet model, and the embedded path information of the image, to inform the message variations, but the predictions derive only from model_transfer.
  • if the dog prediction is not certain and a human is not detected then I make an observation in some cases (wolves, cats, bears, monkeys) that indicates that a dog is really unlikely.
  • other permutations are handled to indicate degrees of certainty, including unknown should it occur.

Points of improvement:

  • The face detector is poor, if it was firmly capable of detecting faces then I could eliminate the uncertainty of human detection, which at the moment I am trying to solve with the ImageNet classifier and the somewhat complex decision-making process
  • The major weakenss of a dog classifier is that it can only predict dog, no matter what its given. We need it to consider humans as dogs, so we can say what kind of dog, but it would be an improvement if the classifier could handle the likeness comparison but still return human. I work around this by considering probabilities and then consulting another classifier in run_app, but an improvement would be for this to be handled by the dog classifier.
  • Additionally, the code itself could be moved into a Python class to make it more modular.
  • I could include information about where a dog prediction differs from an incoming dog image, based on deducing the class by the directory name.
  • Likewise I could be more specific about what I think is really detected if it's not a dog.
In [146]:
## TODO: Execute your algorithm from Step 6 on
## at least 6 images on your computer.
## Feel free to use as many code cells as needed.

## suggested code, below
for file in np.hstack((human_files[0], human_files[2],
                       '/data/lfw/Aaron_Guiel/Aaron_Guiel_0001.jpg',
                       dog_files[0], dog_files[80], dog_files[200])):
    run_app(file)
In [147]:
for file in np.hstack(('images/Chris_HS_square.jpg', 
                       'images/Curly-coated_retriever_03896.jpg',
                       'images/American_water_spaniel_00648.jpg',
                       'images/Brittany_02625.jpg',
                       'images/Welsh_springer_spaniel_08203.jpg',
                       'images/cat.69.jpg',
                       'images/cat.3.jpg',
                       'images/cat.10.jpg',
                       'images/black_bear.jpg',
                       'images/gorilla.jpg',
                       'images/chimp.jpg',
                       'images/that_monkey.jpg',
                       'images/monkeys.jpg',
                       'images/goldfish.jpg',
                       'images/wolf_1.jpg',
                       'images/wolf_2.jpg',
                       'images/Labrador_retriever_06455.jpg', 
                       'images/Labrador_retriever_06449.jpg',
                       'images/Labrador_retriever_06455.jpg',
                       'images/Labrador_retriever_06457.jpg'
                      )):
    run_app(file)

Utilities

In [148]:
%%javascript

// Sourced from http://nbviewer.jupyter.org/gist/minrk/5d0946d39d511d9e0b5a

$("#renumber-button").parent().remove();

function renumber() {
    // renumber cells in order
    var i=1;
    IPython.notebook.get_cells().map(function (cell) {
        if (cell.cell_type == 'code') {
            // set the input prompt
            cell.set_input_prompt(i);
            // set the output prompt (in two places)
            cell.output_area.outputs.map(function (output) {
                if (output.output_type == 'execute_result') {
                    output.execution_count = i;
                    cell.element.find(".output_prompt").text('Out[' + i + ']:');
                }
            });
            i += 1;
        }
    });
}

IPython.toolbar.add_buttons_group([{
  'label'   : 'Renumber',
  'icon'    : 'fa-list-ol',
  'callback': renumber,
  'id'      : 'renumber-button'
}]);

Appendix

Test appearance of different resolutions to find lowest usable value for dataloader...

In [149]:
imshow(img_process(train_dir+'001.Affenpinscher/Affenpinscher_00001.jpg', img_sz=100), 
       title='Affenpinscher')
Out[149]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f2193a470>
In [150]:
imshow(img_process(train_dir+'001.Affenpinscher/Affenpinscher_00001.jpg', img_sz=150), 
       title='Affenpinscher')
Out[150]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f2193ff60>
In [151]:
imshow(img_process(train_dir+'001.Affenpinscher/Affenpinscher_00001.jpg', img_sz=180), 
       title='Affenpinscher')
Out[151]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f218e2438>
In [152]:
imshow(img_process(train_dir+'001.Affenpinscher/Affenpinscher_00001.jpg', img_sz=200), 
       title='Affenpinscher')
Out[152]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f21845390>
In [153]:
imshow(img_process(train_dir+'001.Affenpinscher/Affenpinscher_00001.jpg', img_sz=224), 
       title='Affenpinscher')
Out[153]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f217aca58>

Other scratch models tried...

3 layer CNN

  • this was the most efficient approach to get a result (though there must be better!)
In [ ]:
del model_scratch
In [ ]:
import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
    ### TODO: choose an architecture, and complete the class
    def __init__(self, num_classes=133):
        super(Net, self).__init__()
        ## Define layers of a CNN
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(64, 96, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.classifier = nn.Sequential(
            AdaptiveConcatPool2d(),
            Flatten(),
            nn.BatchNorm1d(96*2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=96*2, out_features=512, bias=True),
            nn.ReLU(),
            nn.BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=512, out_features=num_classes, bias=True)
        )    
        
        
    def forward(self, x):
        ## Define forward behavior
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.classifier(x)

        return x

#-#-# You so NOT have to modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()    
In [ ]:
print(model_scratch)

3 layer CNN with batchnorm

  • this was a little slower to get to the same result as the vanilla 3 layer model
  • there was a smoother variation in loss values as the model trained but ultimately it was a point less accurate
In [ ]:
import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
    ### TODO: choose an architecture, and complete the class
    def __init__(self, num_classes=133):
        super(Net, self).__init__()
        ## Define layers of a CNN
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(64, 96, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True), 
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.classifier = nn.Sequential(
            AdaptiveConcatPool2d(),
            Flatten(),
            nn.BatchNorm1d(96*2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=96*2, out_features=512, bias=True),
            nn.ReLU(),
            nn.BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=512, out_features=num_classes, bias=True)
        )    
        
        
    def forward(self, x):
        ## Define forward behavior
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.classifier(x)

        return x

#-#-# You so NOT have to modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()
In [ ]:
print(model_scratch)
Net(
  (layer1): Sequential(
    (0): Conv2d(3, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
    (1): ReLU()
    (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer2): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer3): Sequential(
    (0): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=(1, 1))
      (mp): AdaptiveMaxPool2d(output_size=(1, 1))
    )
    (1): Flatten()
    (2): BatchNorm1d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.25)
    (4): Linear(in_features=192, out_features=512, bias=True)
    (5): ReLU()
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.25)
    (8): Linear(in_features=512, out_features=133, bias=True)
  )
)

6 layer CNN

  • not as useful as a simpler design, it took longer to train and didn't get as good accuracy for the same number of epochs
In [ ]:
import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
    ### TODO: choose an architecture, and complete the class
    def __init__(self, num_classes=133):
        super(Net, self).__init__()
        ## Define layers of a CNN
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2, bias=False),
            nn.ReLU(),
            nn.Conv2d(32, 32, kernel_size=5, stride=1, padding=2, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(64, 96, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.Conv2d(96, 96, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True), 
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.classifier = nn.Sequential(
            AdaptiveConcatPool2d(),
            Flatten(),
            nn.BatchNorm1d(96*2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=96*2, out_features=512, bias=True),
            nn.ReLU(),
            nn.BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=512, out_features=num_classes, bias=True)
        )    
        
        
    def forward(self, x):
        ## Define forward behavior
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.classifier(x)

        return x

#-#-# You so NOT have to modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()
In [ ]:
print(model_scratch)
Net(
  (layer1): Sequential(
    (0): Conv2d(3, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
    (3): ReLU()
    (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer2): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (3): ReLU()
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer3): Sequential(
    (0): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (3): ReLU()
    (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=(1, 1))
      (mp): AdaptiveMaxPool2d(output_size=(1, 1))
    )
    (1): Flatten()
    (2): BatchNorm1d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.25)
    (4): Linear(in_features=192, out_features=512, bias=True)
    (5): ReLU()
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.25)
    (8): Linear(in_features=512, out_features=133, bias=True)
  )
)

4 layer CNN

  • this failed to make any progress converging, I abandoned training it
In [ ]:
import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
    ### TODO: choose an architecture, and complete the class
    def __init__(self, num_classes=133):
        super(Net, self).__init__()
        ## Define layers of a CNN
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(64, 96, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.layer4 = nn.Sequential(
            nn.Conv2d(96, 128, kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        self.classifier = nn.Sequential(
            AdaptiveConcatPool2d(),
            Flatten(),
            nn.BatchNorm1d(128*2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=128*2, out_features=512, bias=True),
            nn.ReLU(),
            nn.BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.Dropout(p=0.25),
            nn.Linear(in_features=512, out_features=num_classes, bias=True)
        )    
        
        
    def forward(self, x):
        ## Define forward behavior
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.classifier(x)

        return x

#-#-# You so NOT have to modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()
In [ ]:
print(model_scratch)
``` Net( (layer1): Sequential( (0): Conv2d(3, 32, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1), bias=False) (1): ReLU() (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (layer2): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): ReLU() (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (layer3): Sequential( (0): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): ReLU() (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (layer4): Sequential( (0): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): ReLU() (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (classifier): Sequential( (0): AdaptiveConcatPool2d( (ap): AdaptiveAvgPool2d(output_size=(1, 1)) (mp): AdaptiveMaxPool2d(output_size=(1, 1)) ) (1): Flatten() (2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): Dropout(p=0.25) (4): Linear(in_features=256, out_features=512, bias=True) (5): ReLU() (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (7): Dropout(p=0.25) (8): Linear(in_features=512, out_features=133, bias=True) ) ) ```

3 layer CNN - 48-96-128 configuration,

first 300 epochs training run output

Epoch: 1    Training Loss: 4.961590     Validation Loss: 4.919203
Epoch: 1    Training Accuracy: 0.007037     Validation Accuracy: 0.013174
Accuracy has increased (0.000000 --> 0.013174)  Saving model as model_scratch_acc...
Validation loss decreased (inf --> 4.910386)  Saving model as model_scratch_val...

Epoch: 2    Training Loss: 4.916895     Validation Loss: 4.887988
Epoch: 2    Training Accuracy: 0.010930     Validation Accuracy: 0.015569
Accuracy has increased (0.013174 --> 0.015569)  Saving model as model_scratch_acc...
Validation loss decreased (4.910386 --> 4.883724)  Saving model as model_scratch_val...

Epoch: 3    Training Loss: 4.899519     Validation Loss: 4.852184
Epoch: 3    Training Accuracy: 0.010630     Validation Accuracy: 0.016766
Accuracy has increased (0.015569 --> 0.016766)  Saving model as model_scratch_acc...
Validation loss decreased (4.883724 --> 4.853319)  Saving model as model_scratch_val...

Epoch: 4    Training Loss: 4.865728     Validation Loss: 4.833315
Epoch: 4    Training Accuracy: 0.015871     Validation Accuracy: 0.015569
Validation loss decreased (4.853319 --> 4.833284)  Saving model as model_scratch_val...

Epoch: 5    Training Loss: 4.833236     Validation Loss: 4.808116
Epoch: 5    Training Accuracy: 0.017068     Validation Accuracy: 0.026347
Accuracy has increased (0.016766 --> 0.026347)  Saving model as model_scratch_acc...
Validation loss decreased (4.833284 --> 4.811524)  Saving model as model_scratch_val...

Epoch: 6    Training Loss: 4.810703     Validation Loss: 4.801436
Epoch: 6    Training Accuracy: 0.017817     Validation Accuracy: 0.021557
Validation loss decreased (4.811524 --> 4.792679)  Saving model as model_scratch_val...

Epoch: 7    Training Loss: 4.787861     Validation Loss: 4.792127
Epoch: 7    Training Accuracy: 0.022758     Validation Accuracy: 0.022754
Validation loss decreased (4.792679 --> 4.779539)  Saving model as model_scratch_val...

Epoch: 8    Training Loss: 4.767632     Validation Loss: 4.769209
Epoch: 8    Training Accuracy: 0.023507     Validation Accuracy: 0.026347
Validation loss decreased (4.779539 --> 4.762892)  Saving model as model_scratch_val...

Epoch: 9    Training Loss: 4.738388     Validation Loss: 4.745911
Epoch: 9    Training Accuracy: 0.029046     Validation Accuracy: 0.031138
Accuracy has increased (0.026347 --> 0.031138)  Saving model as model_scratch_acc...
Validation loss decreased (4.762892 --> 4.749212)  Saving model as model_scratch_val...

Epoch: 10   Training Loss: 4.730064     Validation Loss: 4.723873
Epoch: 10   Training Accuracy: 0.027549     Validation Accuracy: 0.026347
Validation loss decreased (4.749212 --> 4.734878)  Saving model as model_scratch_val...

Epoch: 11   Training Loss: 4.707745     Validation Loss: 4.721910
Epoch: 11   Training Accuracy: 0.028897     Validation Accuracy: 0.025150
Validation loss decreased (4.734878 --> 4.717407)  Saving model as model_scratch_val...

Epoch: 12   Training Loss: 4.694037     Validation Loss: 4.694993
Epoch: 12   Training Accuracy: 0.033837     Validation Accuracy: 0.028743
Validation loss decreased (4.717407 --> 4.701353)  Saving model as model_scratch_val...

Epoch: 13   Training Loss: 4.669737     Validation Loss: 4.695582
Epoch: 13   Training Accuracy: 0.034287     Validation Accuracy: 0.025150
Validation loss decreased (4.701353 --> 4.694014)  Saving model as model_scratch_val...

Epoch: 14   Training Loss: 4.661658     Validation Loss: 4.669526
Epoch: 14   Training Accuracy: 0.033987     Validation Accuracy: 0.034731
Accuracy has increased (0.031138 --> 0.034731)  Saving model as model_scratch_acc...
Validation loss decreased (4.694014 --> 4.678348)  Saving model as model_scratch_val...

Epoch: 15   Training Loss: 4.642054     Validation Loss: 4.651555
Epoch: 15   Training Accuracy: 0.036832     Validation Accuracy: 0.034731
Validation loss decreased (4.678348 --> 4.658789)  Saving model as model_scratch_val...

Epoch: 16   Training Loss: 4.623108     Validation Loss: 4.661414
Epoch: 16   Training Accuracy: 0.038928     Validation Accuracy: 0.032335
Validation loss decreased (4.658789 --> 4.649315)  Saving model as model_scratch_val...

Epoch: 17   Training Loss: 4.616818     Validation Loss: 4.629588
Epoch: 17   Training Accuracy: 0.039377     Validation Accuracy: 0.034731
Validation loss decreased (4.649315 --> 4.634639)  Saving model as model_scratch_val...

Epoch: 18   Training Loss: 4.593741     Validation Loss: 4.654449
Epoch: 18   Training Accuracy: 0.041773     Validation Accuracy: 0.037126
Accuracy has increased (0.034731 --> 0.037126)  Saving model as model_scratch_acc...
Validation loss decreased (4.634639 --> 4.623248)  Saving model as model_scratch_val...

Epoch: 19   Training Loss: 4.585801     Validation Loss: 4.602505
Epoch: 19   Training Accuracy: 0.042671     Validation Accuracy: 0.041916
Accuracy has increased (0.037126 --> 0.041916)  Saving model as model_scratch_acc...
Validation loss decreased (4.623248 --> 4.607734)  Saving model as model_scratch_val...

Epoch: 20   Training Loss: 4.566946     Validation Loss: 4.613661
Epoch: 20   Training Accuracy: 0.046714     Validation Accuracy: 0.039521
Validation loss decreased (4.607734 --> 4.599737)  Saving model as model_scratch_val...

Epoch: 21   Training Loss: 4.550723     Validation Loss: 4.567756
Epoch: 21   Training Accuracy: 0.045366     Validation Accuracy: 0.038323
Validation loss decreased (4.599737 --> 4.578774)  Saving model as model_scratch_val...

Epoch: 22   Training Loss: 4.538908     Validation Loss: 4.575675
Epoch: 22   Training Accuracy: 0.044767     Validation Accuracy: 0.044311
Accuracy has increased (0.041916 --> 0.044311)  Saving model as model_scratch_acc...
Validation loss decreased (4.578774 --> 4.571023)  Saving model as model_scratch_val...

Epoch: 23   Training Loss: 4.525758     Validation Loss: 4.562220
Epoch: 23   Training Accuracy: 0.049858     Validation Accuracy: 0.046707
Accuracy has increased (0.044311 --> 0.046707)  Saving model as model_scratch_acc...
Validation loss decreased (4.571023 --> 4.561492)  Saving model as model_scratch_val...

Epoch: 24   Training Loss: 4.517275     Validation Loss: 4.533606
Epoch: 24   Training Accuracy: 0.049558     Validation Accuracy: 0.051497
Accuracy has increased (0.046707 --> 0.051497)  Saving model as model_scratch_acc...
Validation loss decreased (4.561492 --> 4.544248)  Saving model as model_scratch_val...

Epoch: 25   Training Loss: 4.502127     Validation Loss: 4.542381
Epoch: 25   Training Accuracy: 0.049708     Validation Accuracy: 0.052695
Accuracy has increased (0.051497 --> 0.052695)  Saving model as model_scratch_acc...
Validation loss decreased (4.544248 --> 4.538356)  Saving model as model_scratch_val...

Epoch: 26   Training Loss: 4.494338     Validation Loss: 4.532583
Epoch: 26   Training Accuracy: 0.050007     Validation Accuracy: 0.058683
Accuracy has increased (0.052695 --> 0.058683)  Saving model as model_scratch_acc...
Validation loss decreased (4.538356 --> 4.520129)  Saving model as model_scratch_val...

Epoch: 27   Training Loss: 4.477561     Validation Loss: 4.495865
Epoch: 27   Training Accuracy: 0.047911     Validation Accuracy: 0.049102
Validation loss decreased (4.520129 --> 4.508329)  Saving model as model_scratch_val...

Epoch: 28   Training Loss: 4.458783     Validation Loss: 4.506287
Epoch: 28   Training Accuracy: 0.055996     Validation Accuracy: 0.056287
Validation loss decreased (4.508329 --> 4.500564)  Saving model as model_scratch_val...

Epoch: 29   Training Loss: 4.460913     Validation Loss: 4.486317
Epoch: 29   Training Accuracy: 0.057643     Validation Accuracy: 0.057485
Validation loss decreased (4.500564 --> 4.485874)  Saving model as model_scratch_val...

Epoch: 30   Training Loss: 4.441758     Validation Loss: 4.479993
Epoch: 30   Training Accuracy: 0.057344     Validation Accuracy: 0.064671
Accuracy has increased (0.058683 --> 0.064671)  Saving model as model_scratch_acc...
Validation loss decreased (4.485874 --> 4.467017)  Saving model as model_scratch_val...

Epoch: 31   Training Loss: 4.433699     Validation Loss: 4.470017
Epoch: 31   Training Accuracy: 0.057044     Validation Accuracy: 0.062275
Validation loss decreased (4.467017 --> 4.455606)  Saving model as model_scratch_val...

Epoch: 32   Training Loss: 4.414536     Validation Loss: 4.455982
Epoch: 32   Training Accuracy: 0.059590     Validation Accuracy: 0.064671
Validation loss decreased (4.455606 --> 4.438676)  Saving model as model_scratch_val...

Epoch: 33   Training Loss: 4.412766     Validation Loss: 4.416891
Epoch: 33   Training Accuracy: 0.059889     Validation Accuracy: 0.065868
Accuracy has increased (0.064671 --> 0.065868)  Saving model as model_scratch_acc...
Validation loss decreased (4.438676 --> 4.425948)  Saving model as model_scratch_val...

Epoch: 34   Training Loss: 4.394729     Validation Loss: 4.416844
Epoch: 34   Training Accuracy: 0.062135     Validation Accuracy: 0.063473
Validation loss decreased (4.425948 --> 4.418158)  Saving model as model_scratch_val...

Epoch: 35   Training Loss: 4.383926     Validation Loss: 4.420107
Epoch: 35   Training Accuracy: 0.060937     Validation Accuracy: 0.069461
Accuracy has increased (0.065868 --> 0.069461)  Saving model as model_scratch_acc...
Validation loss decreased (4.418158 --> 4.415691)  Saving model as model_scratch_val...

Epoch: 36   Training Loss: 4.381893     Validation Loss: 4.400412
Epoch: 36   Training Accuracy: 0.061536     Validation Accuracy: 0.067066
Validation loss decreased (4.415691 --> 4.401057)  Saving model as model_scratch_val...

Epoch: 37   Training Loss: 4.362713     Validation Loss: 4.398118
Epoch: 37   Training Accuracy: 0.064531     Validation Accuracy: 0.065868
Validation loss decreased (4.401057 --> 4.387594)  Saving model as model_scratch_val...

Epoch: 38   Training Loss: 4.353604     Validation Loss: 4.380332
Epoch: 38   Training Accuracy: 0.066776     Validation Accuracy: 0.070659
Accuracy has increased (0.069461 --> 0.070659)  Saving model as model_scratch_acc...
Validation loss decreased (4.387594 --> 4.375505)  Saving model as model_scratch_val...

Epoch: 39   Training Loss: 4.341194     Validation Loss: 4.360950
Epoch: 39   Training Accuracy: 0.064531     Validation Accuracy: 0.067066
Validation loss decreased (4.375505 --> 4.374117)  Saving model as model_scratch_val...

Epoch: 40   Training Loss: 4.320673     Validation Loss: 4.373131
Epoch: 40   Training Accuracy: 0.066627     Validation Accuracy: 0.065868
Validation loss decreased (4.374117 --> 4.350564)  Saving model as model_scratch_val...

Epoch: 41   Training Loss: 4.320134     Validation Loss: 4.347889
Epoch: 41   Training Accuracy: 0.072765     Validation Accuracy: 0.071856
Accuracy has increased (0.070659 --> 0.071856)  Saving model as model_scratch_acc...

Epoch: 42   Training Loss: 4.312190     Validation Loss: 4.330526
Epoch: 42   Training Accuracy: 0.070370     Validation Accuracy: 0.075449
Accuracy has increased (0.071856 --> 0.075449)  Saving model as model_scratch_acc...
Validation loss decreased (4.350564 --> 4.342403)  Saving model as model_scratch_val...

Epoch: 43   Training Loss: 4.290071     Validation Loss: 4.336576
Epoch: 43   Training Accuracy: 0.077706     Validation Accuracy: 0.070659
Validation loss decreased (4.342403 --> 4.333960)  Saving model as model_scratch_val...

Epoch: 44   Training Loss: 4.282736     Validation Loss: 4.314463
Epoch: 44   Training Accuracy: 0.071568     Validation Accuracy: 0.079042
Accuracy has increased (0.075449 --> 0.079042)  Saving model as model_scratch_acc...
Validation loss decreased (4.333960 --> 4.317463)  Saving model as model_scratch_val...

Epoch: 45   Training Loss: 4.269857     Validation Loss: 4.308572
Epoch: 45   Training Accuracy: 0.072915     Validation Accuracy: 0.081437
Accuracy has increased (0.079042 --> 0.081437)  Saving model as model_scratch_acc...
Validation loss decreased (4.317463 --> 4.300952)  Saving model as model_scratch_val...

Epoch: 46   Training Loss: 4.261287     Validation Loss: 4.287661
Epoch: 46   Training Accuracy: 0.078305     Validation Accuracy: 0.076647
Validation loss decreased (4.300952 --> 4.288138)  Saving model as model_scratch_val...

Epoch: 47   Training Loss: 4.241984     Validation Loss: 4.285376
Epoch: 47   Training Accuracy: 0.082647     Validation Accuracy: 0.083832
Accuracy has increased (0.081437 --> 0.083832)  Saving model as model_scratch_acc...
Validation loss decreased (4.288138 --> 4.283875)  Saving model as model_scratch_val...

Epoch: 48   Training Loss: 4.248154     Validation Loss: 4.265169
Epoch: 48   Training Accuracy: 0.076808     Validation Accuracy: 0.089820
Accuracy has increased (0.083832 --> 0.089820)  Saving model as model_scratch_acc...
Validation loss decreased (4.283875 --> 4.275203)  Saving model as model_scratch_val...

Epoch: 49   Training Loss: 4.221133     Validation Loss: 4.234239
Epoch: 49   Training Accuracy: 0.089235     Validation Accuracy: 0.086228
Validation loss decreased (4.275203 --> 4.253997)  Saving model as model_scratch_val...

Epoch: 50   Training Loss: 4.227645     Validation Loss: 4.225454
Epoch: 50   Training Accuracy: 0.080252     Validation Accuracy: 0.079042
Validation loss decreased (4.253997 --> 4.245283)  Saving model as model_scratch_val...

Epoch: 51   Training Loss: 4.210372     Validation Loss: 4.242854
Epoch: 51   Training Accuracy: 0.082048     Validation Accuracy: 0.087425
Validation loss decreased (4.245283 --> 4.233133)  Saving model as model_scratch_val...

Epoch: 52   Training Loss: 4.196611     Validation Loss: 4.255041
Epoch: 52   Training Accuracy: 0.083845     Validation Accuracy: 0.079042
Validation loss decreased (4.233133 --> 4.226763)  Saving model as model_scratch_val...

Epoch: 53   Training Loss: 4.188484     Validation Loss: 4.217831
Epoch: 53   Training Accuracy: 0.085043     Validation Accuracy: 0.091018
Accuracy has increased (0.089820 --> 0.091018)  Saving model as model_scratch_acc...
Validation loss decreased (4.226763 --> 4.216272)  Saving model as model_scratch_val...

Epoch: 54   Training Loss: 4.182420     Validation Loss: 4.210487
Epoch: 54   Training Accuracy: 0.085642     Validation Accuracy: 0.089820
Validation loss decreased (4.216272 --> 4.208432)  Saving model as model_scratch_val...

Epoch: 55   Training Loss: 4.164401     Validation Loss: 4.203396
Epoch: 55   Training Accuracy: 0.089534     Validation Accuracy: 0.086228
Validation loss decreased (4.208432 --> 4.195600)  Saving model as model_scratch_val...

Epoch: 56   Training Loss: 4.148393     Validation Loss: 4.191838
Epoch: 56   Training Accuracy: 0.094176     Validation Accuracy: 0.085030
Validation loss decreased (4.195600 --> 4.191843)  Saving model as model_scratch_val...

Epoch: 57   Training Loss: 4.146093     Validation Loss: 4.199538
Epoch: 57   Training Accuracy: 0.092978     Validation Accuracy: 0.088623
Validation loss decreased (4.191843 --> 4.176208)  Saving model as model_scratch_val...

Epoch: 58   Training Loss: 4.135556     Validation Loss: 4.178194
Epoch: 58   Training Accuracy: 0.090133     Validation Accuracy: 0.088623
Validation loss decreased (4.176208 --> 4.158777)  Saving model as model_scratch_val...

Epoch: 59   Training Loss: 4.125721     Validation Loss: 4.156998
Epoch: 59   Training Accuracy: 0.089235     Validation Accuracy: 0.088623
Validation loss decreased (4.158777 --> 4.156878)  Saving model as model_scratch_val...

Epoch: 60   Training Loss: 4.127804     Validation Loss: 4.157080
Epoch: 60   Training Accuracy: 0.094475     Validation Accuracy: 0.089820
Validation loss decreased (4.156878 --> 4.145257)  Saving model as model_scratch_val...

Epoch: 61   Training Loss: 4.107591     Validation Loss: 4.136007
Epoch: 61   Training Accuracy: 0.104207     Validation Accuracy: 0.087425
Validation loss decreased (4.145257 --> 4.131725)  Saving model as model_scratch_val...

Epoch: 62   Training Loss: 4.087030     Validation Loss: 4.135820
Epoch: 62   Training Accuracy: 0.098368     Validation Accuracy: 0.094611
Accuracy has increased (0.091018 --> 0.094611)  Saving model as model_scratch_acc...
Validation loss decreased (4.131725 --> 4.124839)  Saving model as model_scratch_val...

Epoch: 63   Training Loss: 4.083349     Validation Loss: 4.103133
Epoch: 63   Training Accuracy: 0.097619     Validation Accuracy: 0.100599
Accuracy has increased (0.094611 --> 0.100599)  Saving model as model_scratch_acc...
Validation loss decreased (4.124839 --> 4.118589)  Saving model as model_scratch_val...

Epoch: 64   Training Loss: 4.082498     Validation Loss: 4.093785
Epoch: 64   Training Accuracy: 0.098967     Validation Accuracy: 0.088623
Validation loss decreased (4.118589 --> 4.107300)  Saving model as model_scratch_val...

Epoch: 65   Training Loss: 4.063479     Validation Loss: 4.089746
Epoch: 65   Training Accuracy: 0.105255     Validation Accuracy: 0.097006
Validation loss decreased (4.107300 --> 4.095206)  Saving model as model_scratch_val...

Epoch: 66   Training Loss: 4.047646     Validation Loss: 4.098458
Epoch: 66   Training Accuracy: 0.107651     Validation Accuracy: 0.097006
Validation loss decreased (4.095206 --> 4.089656)  Saving model as model_scratch_val...

Epoch: 67   Training Loss: 4.052424     Validation Loss: 4.086577
Epoch: 67   Training Accuracy: 0.104357     Validation Accuracy: 0.095808
Validation loss decreased (4.089656 --> 4.069397)  Saving model as model_scratch_val...

Epoch: 68   Training Loss: 4.044847     Validation Loss: 4.064115
Epoch: 68   Training Accuracy: 0.108399     Validation Accuracy: 0.098204
Validation loss decreased (4.069397 --> 4.067373)  Saving model as model_scratch_val...

Epoch: 69   Training Loss: 4.023386     Validation Loss: 4.050792
Epoch: 69   Training Accuracy: 0.104656     Validation Accuracy: 0.101796
Accuracy has increased (0.100599 --> 0.101796)  Saving model as model_scratch_acc...
Validation loss decreased (4.067373 --> 4.047521)  Saving model as model_scratch_val...

Epoch: 70   Training Loss: 4.025335     Validation Loss: 4.032914
Epoch: 70   Training Accuracy: 0.105854     Validation Accuracy: 0.105389
Accuracy has increased (0.101796 --> 0.105389)  Saving model as model_scratch_acc...
Validation loss decreased (4.047521 --> 4.037204)  Saving model as model_scratch_val...

Epoch: 71   Training Loss: 4.004679     Validation Loss: 4.050064
Epoch: 71   Training Accuracy: 0.107501     Validation Accuracy: 0.097006
Validation loss decreased (4.037204 --> 4.035535)  Saving model as model_scratch_val...

Epoch: 72   Training Loss: 4.009485     Validation Loss: 4.044646
Epoch: 72   Training Accuracy: 0.110645     Validation Accuracy: 0.108982
Accuracy has increased (0.105389 --> 0.108982)  Saving model as model_scratch_acc...
Validation loss decreased (4.035535 --> 4.018070)  Saving model as model_scratch_val...

Epoch: 73   Training Loss: 3.990971     Validation Loss: 4.020820
Epoch: 73   Training Accuracy: 0.113789     Validation Accuracy: 0.107784

Epoch: 74   Training Loss: 3.977431     Validation Loss: 4.019989
Epoch: 74   Training Accuracy: 0.111394     Validation Accuracy: 0.101796
Validation loss decreased (4.018070 --> 4.007282)  Saving model as model_scratch_val...

Epoch: 75   Training Loss: 3.983584     Validation Loss: 3.976029
Epoch: 75   Training Accuracy: 0.112292     Validation Accuracy: 0.113772
Accuracy has increased (0.108982 --> 0.113772)  Saving model as model_scratch_acc...
Validation loss decreased (4.007282 --> 3.984974)  Saving model as model_scratch_val...

Epoch: 76   Training Loss: 3.971886     Validation Loss: 3.977511
Epoch: 76   Training Accuracy: 0.112442     Validation Accuracy: 0.108982

Epoch: 77   Training Loss: 3.952858     Validation Loss: 3.961319
Epoch: 77   Training Accuracy: 0.111394     Validation Accuracy: 0.107784
Validation loss decreased (3.984974 --> 3.974182)  Saving model as model_scratch_val...

Epoch: 78   Training Loss: 3.959173     Validation Loss: 3.952832
Epoch: 78   Training Accuracy: 0.111993     Validation Accuracy: 0.116168
Accuracy has increased (0.113772 --> 0.116168)  Saving model as model_scratch_acc...
Validation loss decreased (3.974182 --> 3.962397)  Saving model as model_scratch_val...

Epoch: 79   Training Loss: 3.945993     Validation Loss: 3.956717
Epoch: 79   Training Accuracy: 0.121575     Validation Accuracy: 0.108982
Validation loss decreased (3.962397 --> 3.951301)  Saving model as model_scratch_val...

Epoch: 80   Training Loss: 3.927968     Validation Loss: 3.978616
Epoch: 80   Training Accuracy: 0.122773     Validation Accuracy: 0.122156
Accuracy has increased (0.116168 --> 0.122156)  Saving model as model_scratch_acc...

Epoch: 81   Training Loss: 3.924944     Validation Loss: 3.926396
Epoch: 81   Training Accuracy: 0.129810     Validation Accuracy: 0.118563
Validation loss decreased (3.951301 --> 3.937121)  Saving model as model_scratch_val...

Epoch: 82   Training Loss: 3.919422     Validation Loss: 3.944021
Epoch: 82   Training Accuracy: 0.120677     Validation Accuracy: 0.113772
Validation loss decreased (3.937121 --> 3.931134)  Saving model as model_scratch_val...

Epoch: 83   Training Loss: 3.927398     Validation Loss: 3.924654
Epoch: 83   Training Accuracy: 0.124719     Validation Accuracy: 0.123353
Accuracy has increased (0.122156 --> 0.123353)  Saving model as model_scratch_acc...
Validation loss decreased (3.931134 --> 3.923072)  Saving model as model_scratch_val...

Epoch: 84   Training Loss: 3.903913     Validation Loss: 3.922621
Epoch: 84   Training Accuracy: 0.125168     Validation Accuracy: 0.114970
Validation loss decreased (3.923072 --> 3.915552)  Saving model as model_scratch_val...

Epoch: 85   Training Loss: 3.896400     Validation Loss: 3.904023
Epoch: 85   Training Accuracy: 0.126666     Validation Accuracy: 0.122156
Validation loss decreased (3.915552 --> 3.902635)  Saving model as model_scratch_val...

Epoch: 86   Training Loss: 3.883279     Validation Loss: 3.913476
Epoch: 86   Training Accuracy: 0.126965     Validation Accuracy: 0.120958
Validation loss decreased (3.902635 --> 3.895025)  Saving model as model_scratch_val...

Epoch: 87   Training Loss: 3.867261     Validation Loss: 3.907522
Epoch: 87   Training Accuracy: 0.125468     Validation Accuracy: 0.116168
Validation loss decreased (3.895025 --> 3.878712)  Saving model as model_scratch_val...

Epoch: 88   Training Loss: 3.873288     Validation Loss: 3.871173
Epoch: 88   Training Accuracy: 0.130858     Validation Accuracy: 0.126946
Accuracy has increased (0.123353 --> 0.126946)  Saving model as model_scratch_acc...

Epoch: 89   Training Loss: 3.864966     Validation Loss: 3.852593
Epoch: 89   Training Accuracy: 0.128762     Validation Accuracy: 0.134132
Accuracy has increased (0.126946 --> 0.134132)  Saving model as model_scratch_acc...
Validation loss decreased (3.878712 --> 3.874895)  Saving model as model_scratch_val...

Epoch: 90   Training Loss: 3.848657     Validation Loss: 3.883047
Epoch: 90   Training Accuracy: 0.134302     Validation Accuracy: 0.124551
Validation loss decreased (3.874895 --> 3.865834)  Saving model as model_scratch_val...

Epoch: 91   Training Loss: 3.840996     Validation Loss: 3.861967
Epoch: 91   Training Accuracy: 0.135350     Validation Accuracy: 0.131737
Validation loss decreased (3.865834 --> 3.850665)  Saving model as model_scratch_val...

Epoch: 92   Training Loss: 3.845536     Validation Loss: 3.840424
Epoch: 92   Training Accuracy: 0.134002     Validation Accuracy: 0.132934

Epoch: 93   Training Loss: 3.827234     Validation Loss: 3.818939
Epoch: 93   Training Accuracy: 0.138194     Validation Accuracy: 0.138922
Accuracy has increased (0.134132 --> 0.138922)  Saving model as model_scratch_acc...
Validation loss decreased (3.850665 --> 3.833186)  Saving model as model_scratch_val...

Epoch: 94   Training Loss: 3.834230     Validation Loss: 3.832601
Epoch: 94   Training Accuracy: 0.132655     Validation Accuracy: 0.136527

Epoch: 95   Training Loss: 3.806633     Validation Loss: 3.848658
Epoch: 95   Training Accuracy: 0.135649     Validation Accuracy: 0.129341
Validation loss decreased (3.833186 --> 3.820892)  Saving model as model_scratch_val...

Epoch: 96   Training Loss: 3.800740     Validation Loss: 3.773103
Epoch: 96   Training Accuracy: 0.136547     Validation Accuracy: 0.134132
Validation loss decreased (3.820892 --> 3.804624)  Saving model as model_scratch_val...

Epoch: 97   Training Loss: 3.797831     Validation Loss: 3.802944
Epoch: 97   Training Accuracy: 0.141788     Validation Accuracy: 0.136527

Epoch: 98   Training Loss: 3.793869     Validation Loss: 3.797077
Epoch: 98   Training Accuracy: 0.140740     Validation Accuracy: 0.141317
Accuracy has increased (0.138922 --> 0.141317)  Saving model as model_scratch_acc...
Validation loss decreased (3.804624 --> 3.796730)  Saving model as model_scratch_val...

Epoch: 99   Training Loss: 3.780378     Validation Loss: 3.818412
Epoch: 99   Training Accuracy: 0.141788     Validation Accuracy: 0.134132
Validation loss decreased (3.796730 --> 3.785231)  Saving model as model_scratch_val...

Epoch: 100  Training Loss: 3.774595     Validation Loss: 3.801023
Epoch: 100  Training Accuracy: 0.139392     Validation Accuracy: 0.142515
Accuracy has increased (0.141317 --> 0.142515)  Saving model as model_scratch_acc...

Epoch: 101  Training Loss: 3.778807     Validation Loss: 3.769093
Epoch: 101  Training Accuracy: 0.145381     Validation Accuracy: 0.132934
Validation loss decreased (3.785231 --> 3.772684)  Saving model as model_scratch_val...

Epoch: 102  Training Loss: 3.763545     Validation Loss: 3.789487
Epoch: 102  Training Accuracy: 0.147777     Validation Accuracy: 0.141317
Validation loss decreased (3.772684 --> 3.766131)  Saving model as model_scratch_val...

Epoch: 103  Training Loss: 3.769912     Validation Loss: 3.727435
Epoch: 103  Training Accuracy: 0.141039     Validation Accuracy: 0.152096
Accuracy has increased (0.142515 --> 0.152096)  Saving model as model_scratch_acc...
Validation loss decreased (3.766131 --> 3.749369)  Saving model as model_scratch_val...

Epoch: 104  Training Loss: 3.750626     Validation Loss: 3.754451
Epoch: 104  Training Accuracy: 0.157958     Validation Accuracy: 0.142515

Epoch: 105  Training Loss: 3.740579     Validation Loss: 3.747797
Epoch: 105  Training Accuracy: 0.144183     Validation Accuracy: 0.153293
Accuracy has increased (0.152096 --> 0.153293)  Saving model as model_scratch_acc...
Validation loss decreased (3.749369 --> 3.741578)  Saving model as model_scratch_val...

Epoch: 106  Training Loss: 3.732043     Validation Loss: 3.750707
Epoch: 106  Training Accuracy: 0.152717     Validation Accuracy: 0.146108
Validation loss decreased (3.741578 --> 3.734856)  Saving model as model_scratch_val...

Epoch: 107  Training Loss: 3.727972     Validation Loss: 3.734976
Epoch: 107  Training Accuracy: 0.154963     Validation Accuracy: 0.152096

Epoch: 108  Training Loss: 3.732185     Validation Loss: 3.720115
Epoch: 108  Training Accuracy: 0.149124     Validation Accuracy: 0.154491
Accuracy has increased (0.153293 --> 0.154491)  Saving model as model_scratch_acc...
Validation loss decreased (3.734856 --> 3.718266)  Saving model as model_scratch_val...

Epoch: 109  Training Loss: 3.714134     Validation Loss: 3.745629
Epoch: 109  Training Accuracy: 0.155862     Validation Accuracy: 0.148503

Epoch: 110  Training Loss: 3.708722     Validation Loss: 3.724134
Epoch: 110  Training Accuracy: 0.154664     Validation Accuracy: 0.155689
Accuracy has increased (0.154491 --> 0.155689)  Saving model as model_scratch_acc...
Validation loss decreased (3.718266 --> 3.712750)  Saving model as model_scratch_val...

Epoch: 111  Training Loss: 3.701528     Validation Loss: 3.718828
Epoch: 111  Training Accuracy: 0.159156     Validation Accuracy: 0.158084
Accuracy has increased (0.155689 --> 0.158084)  Saving model as model_scratch_acc...
Validation loss decreased (3.712750 --> 3.705527)  Saving model as model_scratch_val...

Epoch: 112  Training Loss: 3.703289     Validation Loss: 3.709388
Epoch: 112  Training Accuracy: 0.148226     Validation Accuracy: 0.165269
Accuracy has increased (0.158084 --> 0.165269)  Saving model as model_scratch_acc...
Validation loss decreased (3.705527 --> 3.702551)  Saving model as model_scratch_val...

Epoch: 113  Training Loss: 3.685634     Validation Loss: 3.714809
Epoch: 113  Training Accuracy: 0.162000     Validation Accuracy: 0.159281
Validation loss decreased (3.702551 --> 3.693098)  Saving model as model_scratch_val...

Epoch: 114  Training Loss: 3.687310     Validation Loss: 3.674237
Epoch: 114  Training Accuracy: 0.156311     Validation Accuracy: 0.152096
Validation loss decreased (3.693098 --> 3.679544)  Saving model as model_scratch_val...

Epoch: 115  Training Loss: 3.679395     Validation Loss: 3.636306
Epoch: 115  Training Accuracy: 0.155263     Validation Accuracy: 0.161677

Epoch: 116  Training Loss: 3.673845     Validation Loss: 3.667615
Epoch: 116  Training Accuracy: 0.159904     Validation Accuracy: 0.164072
Validation loss decreased (3.679544 --> 3.677391)  Saving model as model_scratch_val...

Epoch: 117  Training Loss: 3.660675     Validation Loss: 3.651525
Epoch: 117  Training Accuracy: 0.160353     Validation Accuracy: 0.170060
Accuracy has increased (0.165269 --> 0.170060)  Saving model as model_scratch_acc...
Validation loss decreased (3.677391 --> 3.656473)  Saving model as model_scratch_val...

Epoch: 118  Training Loss: 3.653746     Validation Loss: 3.664187
Epoch: 118  Training Accuracy: 0.162899     Validation Accuracy: 0.160479

Epoch: 119  Training Loss: 3.645944     Validation Loss: 3.597809
Epoch: 119  Training Accuracy: 0.173080     Validation Accuracy: 0.168862
Validation loss decreased (3.656473 --> 3.649102)  Saving model as model_scratch_val...

Epoch: 120  Training Loss: 3.659712     Validation Loss: 3.636005
Epoch: 120  Training Accuracy: 0.160952     Validation Accuracy: 0.167665
Validation loss decreased (3.649102 --> 3.630691)  Saving model as model_scratch_val...

Epoch: 121  Training Loss: 3.644701     Validation Loss: 3.623431
Epoch: 121  Training Accuracy: 0.162599     Validation Accuracy: 0.170060
Validation loss decreased (3.630691 --> 3.630185)  Saving model as model_scratch_val...

Epoch: 122  Training Loss: 3.645003     Validation Loss: 3.612422
Epoch: 122  Training Accuracy: 0.167989     Validation Accuracy: 0.167665
Validation loss decreased (3.630185 --> 3.627559)  Saving model as model_scratch_val...

Epoch: 123  Training Loss: 3.636429     Validation Loss: 3.609053
Epoch: 123  Training Accuracy: 0.161551     Validation Accuracy: 0.170060
Validation loss decreased (3.627559 --> 3.612759)  Saving model as model_scratch_val...

Epoch: 124  Training Loss: 3.614089     Validation Loss: 3.653380
Epoch: 124  Training Accuracy: 0.164396     Validation Accuracy: 0.168862

Epoch: 125  Training Loss: 3.625476     Validation Loss: 3.650490
Epoch: 125  Training Accuracy: 0.163647     Validation Accuracy: 0.180838
Accuracy has increased (0.170060 --> 0.180838)  Saving model as model_scratch_acc...

Epoch: 126  Training Loss: 3.604398     Validation Loss: 3.594820
Epoch: 126  Training Accuracy: 0.171283     Validation Accuracy: 0.176048
Validation loss decreased (3.612759 --> 3.612396)  Saving model as model_scratch_val...

Epoch: 127  Training Loss: 3.595028     Validation Loss: 3.577291
Epoch: 127  Training Accuracy: 0.175775     Validation Accuracy: 0.179641
Validation loss decreased (3.612396 --> 3.579763)  Saving model as model_scratch_val...

Epoch: 128  Training Loss: 3.599304     Validation Loss: 3.595695
Epoch: 128  Training Accuracy: 0.172481     Validation Accuracy: 0.173653

Epoch: 129  Training Loss: 3.591354     Validation Loss: 3.586159
Epoch: 129  Training Accuracy: 0.168738     Validation Accuracy: 0.180838

Epoch: 130  Training Loss: 3.577348     Validation Loss: 3.616155
Epoch: 130  Training Accuracy: 0.176673     Validation Accuracy: 0.183234
Accuracy has increased (0.180838 --> 0.183234)  Saving model as model_scratch_acc...

Epoch: 131  Training Loss: 3.567074     Validation Loss: 3.531819
Epoch: 131  Training Accuracy: 0.179967     Validation Accuracy: 0.177246
Validation loss decreased (3.579763 --> 3.568812)  Saving model as model_scratch_val...

Epoch: 132  Training Loss: 3.564107     Validation Loss: 3.577753
Epoch: 132  Training Accuracy: 0.175625     Validation Accuracy: 0.183234
Validation loss decreased (3.568812 --> 3.566189)  Saving model as model_scratch_val...

Epoch: 133  Training Loss: 3.550288     Validation Loss: 3.554419
Epoch: 133  Training Accuracy: 0.177122     Validation Accuracy: 0.188024
Accuracy has increased (0.183234 --> 0.188024)  Saving model as model_scratch_acc...
Validation loss decreased (3.566189 --> 3.564850)  Saving model as model_scratch_val...

Epoch: 134  Training Loss: 3.567092     Validation Loss: 3.600030
Epoch: 134  Training Accuracy: 0.176673     Validation Accuracy: 0.174850
Validation loss decreased (3.564850 --> 3.555997)  Saving model as model_scratch_val...

Epoch: 135  Training Loss: 3.552739     Validation Loss: 3.551371
Epoch: 135  Training Accuracy: 0.175775     Validation Accuracy: 0.182036

Epoch: 136  Training Loss: 3.557916     Validation Loss: 3.565541
Epoch: 136  Training Accuracy: 0.175326     Validation Accuracy: 0.190419
Accuracy has increased (0.188024 --> 0.190419)  Saving model as model_scratch_acc...
Validation loss decreased (3.555997 --> 3.549866)  Saving model as model_scratch_val...

Epoch: 137  Training Loss: 3.545806     Validation Loss: 3.531090
Epoch: 137  Training Accuracy: 0.173379     Validation Accuracy: 0.179641
Validation loss decreased (3.549866 --> 3.531282)  Saving model as model_scratch_val...

Epoch: 138  Training Loss: 3.527649     Validation Loss: 3.551901
Epoch: 138  Training Accuracy: 0.183860     Validation Accuracy: 0.194012
Accuracy has increased (0.190419 --> 0.194012)  Saving model as model_scratch_acc...

Epoch: 139  Training Loss: 3.518505     Validation Loss: 3.516140
Epoch: 139  Training Accuracy: 0.187154     Validation Accuracy: 0.183234
Validation loss decreased (3.531282 --> 3.528402)  Saving model as model_scratch_val...

Epoch: 140  Training Loss: 3.535452     Validation Loss: 3.518444
Epoch: 140  Training Accuracy: 0.183411     Validation Accuracy: 0.184431
Validation loss decreased (3.528402 --> 3.522587)  Saving model as model_scratch_val...

Epoch: 141  Training Loss: 3.517627     Validation Loss: 3.484757
Epoch: 141  Training Accuracy: 0.182063     Validation Accuracy: 0.191617
Validation loss decreased (3.522587 --> 3.508568)  Saving model as model_scratch_val...

Epoch: 142  Training Loss: 3.512470     Validation Loss: 3.510522
Epoch: 142  Training Accuracy: 0.184010     Validation Accuracy: 0.190419
Validation loss decreased (3.508568 --> 3.502295)  Saving model as model_scratch_val...

Epoch: 143  Training Loss: 3.510679     Validation Loss: 3.526705
Epoch: 143  Training Accuracy: 0.187902     Validation Accuracy: 0.191617

Epoch: 144  Training Loss: 3.510440     Validation Loss: 3.487937
Epoch: 144  Training Accuracy: 0.186255     Validation Accuracy: 0.196407
Accuracy has increased (0.194012 --> 0.196407)  Saving model as model_scratch_acc...
Validation loss decreased (3.502295 --> 3.498636)  Saving model as model_scratch_val...

Epoch: 145  Training Loss: 3.496385     Validation Loss: 3.496437
Epoch: 145  Training Accuracy: 0.187154     Validation Accuracy: 0.198802
Accuracy has increased (0.196407 --> 0.198802)  Saving model as model_scratch_acc...
Validation loss decreased (3.498636 --> 3.493196)  Saving model as model_scratch_val...

Epoch: 146  Training Loss: 3.501367     Validation Loss: 3.448648
Epoch: 146  Training Accuracy: 0.188651     Validation Accuracy: 0.198802
Validation loss decreased (3.493196 --> 3.488006)  Saving model as model_scratch_val...

Epoch: 147  Training Loss: 3.495651     Validation Loss: 3.468408
Epoch: 147  Training Accuracy: 0.188501     Validation Accuracy: 0.207186
Accuracy has increased (0.198802 --> 0.207186)  Saving model as model_scratch_acc...
Validation loss decreased (3.488006 --> 3.476880)  Saving model as model_scratch_val...

Epoch: 148  Training Loss: 3.479345     Validation Loss: 3.488336
Epoch: 148  Training Accuracy: 0.187154     Validation Accuracy: 0.200000
Validation loss decreased (3.476880 --> 3.473121)  Saving model as model_scratch_val...

Epoch: 149  Training Loss: 3.471785     Validation Loss: 3.481541
Epoch: 149  Training Accuracy: 0.192843     Validation Accuracy: 0.196407

Epoch: 150  Training Loss: 3.468506     Validation Loss: 3.451855
Epoch: 150  Training Accuracy: 0.185507     Validation Accuracy: 0.197605
Validation loss decreased (3.473121 --> 3.469247)  Saving model as model_scratch_val...

Epoch: 151  Training Loss: 3.477448     Validation Loss: 3.450499
Epoch: 151  Training Accuracy: 0.186405     Validation Accuracy: 0.213174
Accuracy has increased (0.207186 --> 0.213174)  Saving model as model_scratch_acc...
Validation loss decreased (3.469247 --> 3.460272)  Saving model as model_scratch_val...

Epoch: 152  Training Loss: 3.460494     Validation Loss: 3.480053
Epoch: 152  Training Accuracy: 0.186405     Validation Accuracy: 0.202395
Validation loss decreased (3.460272 --> 3.457204)  Saving model as model_scratch_val...

Epoch: 153  Training Loss: 3.460124     Validation Loss: 3.442125
Epoch: 153  Training Accuracy: 0.188651     Validation Accuracy: 0.197605
Validation loss decreased (3.457204 --> 3.441875)  Saving model as model_scratch_val...

Epoch: 154  Training Loss: 3.462604     Validation Loss: 3.445044
Epoch: 154  Training Accuracy: 0.188950     Validation Accuracy: 0.201198

Epoch: 155  Training Loss: 3.455742     Validation Loss: 3.438510
Epoch: 155  Training Accuracy: 0.196437     Validation Accuracy: 0.192814
Validation loss decreased (3.441875 --> 3.429765)  Saving model as model_scratch_val...

Epoch: 156  Training Loss: 3.444303     Validation Loss: 3.428447
Epoch: 156  Training Accuracy: 0.190448     Validation Accuracy: 0.207186
Validation loss decreased (3.429765 --> 3.426168)  Saving model as model_scratch_val...

Epoch: 157  Training Loss: 3.439654     Validation Loss: 3.417716
Epoch: 157  Training Accuracy: 0.192095     Validation Accuracy: 0.197605

Epoch: 158  Training Loss: 3.418483     Validation Loss: 3.466569
Epoch: 158  Training Accuracy: 0.204372     Validation Accuracy: 0.203593
Validation loss decreased (3.426168 --> 3.424753)  Saving model as model_scratch_val...

Epoch: 159  Training Loss: 3.414056     Validation Loss: 3.466969
Epoch: 159  Training Accuracy: 0.197934     Validation Accuracy: 0.200000
Validation loss decreased (3.424753 --> 3.414974)  Saving model as model_scratch_val...

Epoch: 160  Training Loss: 3.422180     Validation Loss: 3.366662
Epoch: 160  Training Accuracy: 0.192244     Validation Accuracy: 0.205988
Validation loss decreased (3.414974 --> 3.403102)  Saving model as model_scratch_val...

Epoch: 161  Training Loss: 3.398428     Validation Loss: 3.381595
Epoch: 161  Training Accuracy: 0.203773     Validation Accuracy: 0.202395
Validation loss decreased (3.403102 --> 3.398518)  Saving model as model_scratch_val...

Epoch: 162  Training Loss: 3.405592     Validation Loss: 3.443935
Epoch: 162  Training Accuracy: 0.205121     Validation Accuracy: 0.202395

Epoch: 163  Training Loss: 3.405982     Validation Loss: 3.396415
Epoch: 163  Training Accuracy: 0.202276     Validation Accuracy: 0.210778

Epoch: 164  Training Loss: 3.391732     Validation Loss: 3.456382
Epoch: 164  Training Accuracy: 0.207067     Validation Accuracy: 0.211976
Validation loss decreased (3.398518 --> 3.387149)  Saving model as model_scratch_val...

Epoch: 165  Training Loss: 3.400285     Validation Loss: 3.414183
Epoch: 165  Training Accuracy: 0.198084     Validation Accuracy: 0.211976
Validation loss decreased (3.387149 --> 3.384973)  Saving model as model_scratch_val...

Epoch: 166  Training Loss: 3.388603     Validation Loss: 3.394268
Epoch: 166  Training Accuracy: 0.204971     Validation Accuracy: 0.204790
Validation loss decreased (3.384973 --> 3.382701)  Saving model as model_scratch_val...

Epoch: 167  Training Loss: 3.382042     Validation Loss: 3.410890
Epoch: 167  Training Accuracy: 0.209163     Validation Accuracy: 0.205988
Validation loss decreased (3.382701 --> 3.380821)  Saving model as model_scratch_val...

Epoch: 168  Training Loss: 3.367097     Validation Loss: 3.364518
Epoch: 168  Training Accuracy: 0.209612     Validation Accuracy: 0.219162
Accuracy has increased (0.213174 --> 0.219162)  Saving model as model_scratch_acc...
Validation loss decreased (3.380821 --> 3.376894)  Saving model as model_scratch_val...

Epoch: 169  Training Loss: 3.374738     Validation Loss: 3.404622
Epoch: 169  Training Accuracy: 0.197035     Validation Accuracy: 0.211976
Validation loss decreased (3.376894 --> 3.366131)  Saving model as model_scratch_val...

Epoch: 170  Training Loss: 3.372807     Validation Loss: 3.369359
Epoch: 170  Training Accuracy: 0.206468     Validation Accuracy: 0.202395
Validation loss decreased (3.366131 --> 3.364498)  Saving model as model_scratch_val...

Epoch: 171  Training Loss: 3.346749     Validation Loss: 3.344267
Epoch: 171  Training Accuracy: 0.212607     Validation Accuracy: 0.208383
Validation loss decreased (3.364498 --> 3.353850)  Saving model as model_scratch_val...

Epoch: 172  Training Loss: 3.344807     Validation Loss: 3.406757
Epoch: 172  Training Accuracy: 0.211708     Validation Accuracy: 0.207186

Epoch: 173  Training Loss: 3.351054     Validation Loss: 3.333519
Epoch: 173  Training Accuracy: 0.215002     Validation Accuracy: 0.207186
Validation loss decreased (3.353850 --> 3.344182)  Saving model as model_scratch_val...

Epoch: 174  Training Loss: 3.331668     Validation Loss: 3.358231
Epoch: 174  Training Accuracy: 0.210960     Validation Accuracy: 0.208383

Epoch: 175  Training Loss: 3.331441     Validation Loss: 3.319122
Epoch: 175  Training Accuracy: 0.210061     Validation Accuracy: 0.208383
Validation loss decreased (3.344182 --> 3.337987)  Saving model as model_scratch_val...

Epoch: 176  Training Loss: 3.333740     Validation Loss: 3.330969
Epoch: 176  Training Accuracy: 0.219045     Validation Accuracy: 0.213174
Validation loss decreased (3.337987 --> 3.333484)  Saving model as model_scratch_val...

Epoch: 177  Training Loss: 3.337514     Validation Loss: 3.299802
Epoch: 177  Training Accuracy: 0.209912     Validation Accuracy: 0.214371
Validation loss decreased (3.333484 --> 3.325973)  Saving model as model_scratch_val...

Epoch: 178  Training Loss: 3.342118     Validation Loss: 3.341868
Epoch: 178  Training Accuracy: 0.208714     Validation Accuracy: 0.205988
Validation loss decreased (3.325973 --> 3.323620)  Saving model as model_scratch_val...

Epoch: 179  Training Loss: 3.335632     Validation Loss: 3.342526
Epoch: 179  Training Accuracy: 0.211708     Validation Accuracy: 0.204790
Validation loss decreased (3.323620 --> 3.319835)  Saving model as model_scratch_val...

Epoch: 180  Training Loss: 3.311182     Validation Loss: 3.325745
Epoch: 180  Training Accuracy: 0.212906     Validation Accuracy: 0.211976
Validation loss decreased (3.319835 --> 3.314347)  Saving model as model_scratch_val...

Epoch: 181  Training Loss: 3.324093     Validation Loss: 3.277863
Epoch: 181  Training Accuracy: 0.218296     Validation Accuracy: 0.226347
Accuracy has increased (0.219162 --> 0.226347)  Saving model as model_scratch_acc...
Validation loss decreased (3.314347 --> 3.301194)  Saving model as model_scratch_val...

Epoch: 182  Training Loss: 3.312094     Validation Loss: 3.303442
Epoch: 182  Training Accuracy: 0.214553     Validation Accuracy: 0.208383

Epoch: 183  Training Loss: 3.303096     Validation Loss: 3.300206
Epoch: 183  Training Accuracy: 0.217248     Validation Accuracy: 0.219162

Epoch: 184  Training Loss: 3.291498     Validation Loss: 3.298159
Epoch: 184  Training Accuracy: 0.221141     Validation Accuracy: 0.203593
Validation loss decreased (3.301194 --> 3.288874)  Saving model as model_scratch_val...

Epoch: 185  Training Loss: 3.296758     Validation Loss: 3.273988
Epoch: 185  Training Accuracy: 0.220243     Validation Accuracy: 0.211976
Validation loss decreased (3.288874 --> 3.283160)  Saving model as model_scratch_val...

Epoch: 186  Training Loss: 3.294346     Validation Loss: 3.313086
Epoch: 186  Training Accuracy: 0.217847     Validation Accuracy: 0.216766

Epoch: 187  Training Loss: 3.282319     Validation Loss: 3.282534
Epoch: 187  Training Accuracy: 0.224585     Validation Accuracy: 0.210778

Epoch: 188  Training Loss: 3.273715     Validation Loss: 3.307482
Epoch: 188  Training Accuracy: 0.228926     Validation Accuracy: 0.210778
Validation loss decreased (3.283160 --> 3.278415)  Saving model as model_scratch_val...

Epoch: 189  Training Loss: 3.283456     Validation Loss: 3.252257
Epoch: 189  Training Accuracy: 0.228926     Validation Accuracy: 0.209581
Validation loss decreased (3.278415 --> 3.275857)  Saving model as model_scratch_val...

Epoch: 190  Training Loss: 3.281767     Validation Loss: 3.248670
Epoch: 190  Training Accuracy: 0.222189     Validation Accuracy: 0.229940
Accuracy has increased (0.226347 --> 0.229940)  Saving model as model_scratch_acc...
Validation loss decreased (3.275857 --> 3.272370)  Saving model as model_scratch_val...

Epoch: 191  Training Loss: 3.269533     Validation Loss: 3.261844
Epoch: 191  Training Accuracy: 0.229675     Validation Accuracy: 0.210778

Epoch: 192  Training Loss: 3.235482     Validation Loss: 3.244784
Epoch: 192  Training Accuracy: 0.231622     Validation Accuracy: 0.202395

Epoch: 193  Training Loss: 3.255410     Validation Loss: 3.213551
Epoch: 193  Training Accuracy: 0.229825     Validation Accuracy: 0.217964
Validation loss decreased (3.272370 --> 3.252377)  Saving model as model_scratch_val...

Epoch: 194  Training Loss: 3.253254     Validation Loss: 3.207265
Epoch: 194  Training Accuracy: 0.227878     Validation Accuracy: 0.221557
Validation loss decreased (3.252377 --> 3.249375)  Saving model as model_scratch_val...

Epoch: 195  Training Loss: 3.230494     Validation Loss: 3.222841
Epoch: 195  Training Accuracy: 0.232670     Validation Accuracy: 0.217964

Epoch: 196  Training Loss: 3.228616     Validation Loss: 3.230145
Epoch: 196  Training Accuracy: 0.229376     Validation Accuracy: 0.219162
Validation loss decreased (3.249375 --> 3.245753)  Saving model as model_scratch_val...

Epoch: 197  Training Loss: 3.230228     Validation Loss: 3.237736
Epoch: 197  Training Accuracy: 0.230424     Validation Accuracy: 0.222754
Validation loss decreased (3.245753 --> 3.240042)  Saving model as model_scratch_val...

Epoch: 198  Training Loss: 3.241547     Validation Loss: 3.224800
Epoch: 198  Training Accuracy: 0.223536     Validation Accuracy: 0.211976
Validation loss decreased (3.240042 --> 3.239954)  Saving model as model_scratch_val...

Epoch: 199  Training Loss: 3.227425     Validation Loss: 3.210567
Epoch: 199  Training Accuracy: 0.235365     Validation Accuracy: 0.221557
Validation loss decreased (3.239954 --> 3.224857)  Saving model as model_scratch_val...

Epoch: 200  Training Loss: 3.241978     Validation Loss: 3.189444
Epoch: 200  Training Accuracy: 0.223087     Validation Accuracy: 0.220359

Epoch: 201  Training Loss: 3.227196     Validation Loss: 3.229254
Epoch: 201  Training Accuracy: 0.232071     Validation Accuracy: 0.226347
Validation loss decreased (3.224857 --> 3.215815)  Saving model as model_scratch_val...

Epoch: 202  Training Loss: 3.213295     Validation Loss: 3.201240
Epoch: 202  Training Accuracy: 0.233268     Validation Accuracy: 0.214371

Epoch: 203  Training Loss: 3.209830     Validation Loss: 3.228813
Epoch: 203  Training Accuracy: 0.228627     Validation Accuracy: 0.214371

Epoch: 204  Training Loss: 3.205185     Validation Loss: 3.192276
Epoch: 204  Training Accuracy: 0.233568     Validation Accuracy: 0.211976

Epoch: 205  Training Loss: 3.197230     Validation Loss: 3.204510
Epoch: 205  Training Accuracy: 0.231322     Validation Accuracy: 0.216766
Validation loss decreased (3.215815 --> 3.210307)  Saving model as model_scratch_val...

Epoch: 206  Training Loss: 3.209473     Validation Loss: 3.192506
Epoch: 206  Training Accuracy: 0.237760     Validation Accuracy: 0.225150

Epoch: 207  Training Loss: 3.179350     Validation Loss: 3.229929
Epoch: 207  Training Accuracy: 0.235365     Validation Accuracy: 0.226347
Validation loss decreased (3.210307 --> 3.197178)  Saving model as model_scratch_val...

Epoch: 208  Training Loss: 3.163344     Validation Loss: 3.209142
Epoch: 208  Training Accuracy: 0.242551     Validation Accuracy: 0.214371
Validation loss decreased (3.197178 --> 3.191626)  Saving model as model_scratch_val...

Epoch: 209  Training Loss: 3.175554     Validation Loss: 3.183289
Epoch: 209  Training Accuracy: 0.241204     Validation Accuracy: 0.237126
Accuracy has increased (0.229940 --> 0.237126)  Saving model as model_scratch_acc...

Epoch: 210  Training Loss: 3.184031     Validation Loss: 3.160475
Epoch: 210  Training Accuracy: 0.231322     Validation Accuracy: 0.221557

Epoch: 211  Training Loss: 3.178214     Validation Loss: 3.117187
Epoch: 211  Training Accuracy: 0.235963     Validation Accuracy: 0.234731
Validation loss decreased (3.191626 --> 3.162499)  Saving model as model_scratch_val...

Epoch: 212  Training Loss: 3.167099     Validation Loss: 3.153887
Epoch: 212  Training Accuracy: 0.243899     Validation Accuracy: 0.229940

Epoch: 213  Training Loss: 3.165128     Validation Loss: 3.150539
Epoch: 213  Training Accuracy: 0.241204     Validation Accuracy: 0.227545
Validation loss decreased (3.162499 --> 3.154212)  Saving model as model_scratch_val...

Epoch: 214  Training Loss: 3.176518     Validation Loss: 3.218671
Epoch: 214  Training Accuracy: 0.237610     Validation Accuracy: 0.227545

Epoch: 215  Training Loss: 3.166126     Validation Loss: 3.131978
Epoch: 215  Training Accuracy: 0.236712     Validation Accuracy: 0.234731

Epoch: 216  Training Loss: 3.164419     Validation Loss: 3.120225
Epoch: 216  Training Accuracy: 0.239257     Validation Accuracy: 0.227545

Epoch: 217  Training Loss: 3.159211     Validation Loss: 3.144697
Epoch: 217  Training Accuracy: 0.246594     Validation Accuracy: 0.231138
Validation loss decreased (3.154212 --> 3.144161)  Saving model as model_scratch_val...

Epoch: 218  Training Loss: 3.156017     Validation Loss: 3.134391
Epoch: 218  Training Accuracy: 0.246444     Validation Accuracy: 0.229940

Epoch: 219  Training Loss: 3.144932     Validation Loss: 3.127830
Epoch: 219  Training Accuracy: 0.244647     Validation Accuracy: 0.235928

Epoch: 220  Training Loss: 3.142137     Validation Loss: 3.163154
Epoch: 220  Training Accuracy: 0.243000     Validation Accuracy: 0.237126

Epoch: 221  Training Loss: 3.136135     Validation Loss: 3.117907
Epoch: 221  Training Accuracy: 0.247792     Validation Accuracy: 0.228743
Validation loss decreased (3.144161 --> 3.138102)  Saving model as model_scratch_val...

Epoch: 222  Training Loss: 3.130488     Validation Loss: 3.135773
Epoch: 222  Training Accuracy: 0.252433     Validation Accuracy: 0.229940
Validation loss decreased (3.138102 --> 3.130898)  Saving model as model_scratch_val...

Epoch: 223  Training Loss: 3.128628     Validation Loss: 3.115881
Epoch: 223  Training Accuracy: 0.241204     Validation Accuracy: 0.231138
Validation loss decreased (3.130898 --> 3.123908)  Saving model as model_scratch_val...

Epoch: 224  Training Loss: 3.124078     Validation Loss: 3.170024
Epoch: 224  Training Accuracy: 0.251235     Validation Accuracy: 0.240719
Accuracy has increased (0.237126 --> 0.240719)  Saving model as model_scratch_acc...

Epoch: 225  Training Loss: 3.106574     Validation Loss: 3.129433
Epoch: 225  Training Accuracy: 0.250936     Validation Accuracy: 0.239521
Validation loss decreased (3.123908 --> 3.121264)  Saving model as model_scratch_val...

Epoch: 226  Training Loss: 3.110851     Validation Loss: 3.164701
Epoch: 226  Training Accuracy: 0.249439     Validation Accuracy: 0.244311
Accuracy has increased (0.240719 --> 0.244311)  Saving model as model_scratch_acc...

Epoch: 227  Training Loss: 3.132787     Validation Loss: 3.082833
Epoch: 227  Training Accuracy: 0.239257     Validation Accuracy: 0.252695
Accuracy has increased (0.244311 --> 0.252695)  Saving model as model_scratch_acc...
Validation loss decreased (3.121264 --> 3.109714)  Saving model as model_scratch_val...

Epoch: 228  Training Loss: 3.115636     Validation Loss: 3.117486
Epoch: 228  Training Accuracy: 0.247342     Validation Accuracy: 0.241916
Validation loss decreased (3.109714 --> 3.108695)  Saving model as model_scratch_val...

Epoch: 229  Training Loss: 3.100663     Validation Loss: 3.065283
Epoch: 229  Training Accuracy: 0.251834     Validation Accuracy: 0.245509
Validation loss decreased (3.108695 --> 3.108042)  Saving model as model_scratch_val...

Epoch: 230  Training Loss: 3.081341     Validation Loss: 3.121293
Epoch: 230  Training Accuracy: 0.255128     Validation Accuracy: 0.243114
Validation loss decreased (3.108042 --> 3.103988)  Saving model as model_scratch_val...

Epoch: 231  Training Loss: 3.087677     Validation Loss: 3.054191
Epoch: 231  Training Accuracy: 0.256775     Validation Accuracy: 0.245509
Validation loss decreased (3.103988 --> 3.103697)  Saving model as model_scratch_val...

Epoch: 232  Training Loss: 3.096594     Validation Loss: 3.091248
Epoch: 232  Training Accuracy: 0.255427     Validation Accuracy: 0.244311
Validation loss decreased (3.103697 --> 3.095810)  Saving model as model_scratch_val...

Epoch: 233  Training Loss: 3.084939     Validation Loss: 3.122943
Epoch: 233  Training Accuracy: 0.259320     Validation Accuracy: 0.235928
Validation loss decreased (3.095810 --> 3.093107)  Saving model as model_scratch_val...

Epoch: 234  Training Loss: 3.071101     Validation Loss: 3.088638
Epoch: 234  Training Accuracy: 0.268154     Validation Accuracy: 0.243114
Validation loss decreased (3.093107 --> 3.084819)  Saving model as model_scratch_val...

Epoch: 235  Training Loss: 3.090981     Validation Loss: 3.101937
Epoch: 235  Training Accuracy: 0.259769     Validation Accuracy: 0.241916
Validation loss decreased (3.084819 --> 3.079987)  Saving model as model_scratch_val...

Epoch: 236  Training Loss: 3.056785     Validation Loss: 3.066923
Epoch: 236  Training Accuracy: 0.257973     Validation Accuracy: 0.235928
Validation loss decreased (3.079987 --> 3.074361)  Saving model as model_scratch_val...

Epoch: 237  Training Loss: 3.088672     Validation Loss: 3.036625
Epoch: 237  Training Accuracy: 0.262764     Validation Accuracy: 0.259880
Accuracy has increased (0.252695 --> 0.259880)  Saving model as model_scratch_acc...

Epoch: 238  Training Loss: 3.047379     Validation Loss: 3.082693
Epoch: 238  Training Accuracy: 0.266956     Validation Accuracy: 0.255090
Validation loss decreased (3.074361 --> 3.060797)  Saving model as model_scratch_val...

Epoch: 239  Training Loss: 3.053712     Validation Loss: 3.044106
Epoch: 239  Training Accuracy: 0.257673     Validation Accuracy: 0.247904

Epoch: 240  Training Loss: 3.045533     Validation Loss: 3.044294
Epoch: 240  Training Accuracy: 0.259171     Validation Accuracy: 0.264671
Accuracy has increased (0.259880 --> 0.264671)  Saving model as model_scratch_acc...
Validation loss decreased (3.060797 --> 3.045634)  Saving model as model_scratch_val...

Epoch: 241  Training Loss: 3.039705     Validation Loss: 3.112259
Epoch: 241  Training Accuracy: 0.268154     Validation Accuracy: 0.243114

Epoch: 242  Training Loss: 3.053699     Validation Loss: 3.025354
Epoch: 242  Training Accuracy: 0.264261     Validation Accuracy: 0.245509

Epoch: 243  Training Loss: 3.046883     Validation Loss: 3.046190
Epoch: 243  Training Accuracy: 0.257524     Validation Accuracy: 0.251497

Epoch: 244  Training Loss: 3.049020     Validation Loss: 3.060204
Epoch: 244  Training Accuracy: 0.260668     Validation Accuracy: 0.253892

Epoch: 245  Training Loss: 3.052105     Validation Loss: 3.027340
Epoch: 245  Training Accuracy: 0.258272     Validation Accuracy: 0.255090
Validation loss decreased (3.045634 --> 3.020443)  Saving model as model_scratch_val...

Epoch: 246  Training Loss: 3.034375     Validation Loss: 3.007017
Epoch: 246  Training Accuracy: 0.264710     Validation Accuracy: 0.252695

Epoch: 247  Training Loss: 3.022372     Validation Loss: 3.020182
Epoch: 247  Training Accuracy: 0.270549     Validation Accuracy: 0.253892

Epoch: 248  Training Loss: 3.020650     Validation Loss: 3.004272
Epoch: 248  Training Accuracy: 0.269801     Validation Accuracy: 0.252695

Epoch: 249  Training Loss: 3.015954     Validation Loss: 3.038483
Epoch: 249  Training Accuracy: 0.269501     Validation Accuracy: 0.251497

Epoch: 250  Training Loss: 3.037405     Validation Loss: 3.033498
Epoch: 250  Training Accuracy: 0.263662     Validation Accuracy: 0.252695

Epoch: 251  Training Loss: 3.018105     Validation Loss: 3.023722
Epoch: 251  Training Accuracy: 0.271148     Validation Accuracy: 0.262275

Epoch: 252  Training Loss: 2.997476     Validation Loss: 2.989423
Epoch: 252  Training Accuracy: 0.272795     Validation Accuracy: 0.261078
Validation loss decreased (3.020443 --> 3.009181)  Saving model as model_scratch_val...

Epoch: 253  Training Loss: 3.011675     Validation Loss: 2.989468
Epoch: 253  Training Accuracy: 0.266657     Validation Accuracy: 0.249102

Epoch: 254  Training Loss: 2.990109     Validation Loss: 2.995637
Epoch: 254  Training Accuracy: 0.275790     Validation Accuracy: 0.255090

Epoch: 255  Training Loss: 3.000075     Validation Loss: 3.015027
Epoch: 255  Training Accuracy: 0.270849     Validation Accuracy: 0.262275

Epoch: 256  Training Loss: 3.001055     Validation Loss: 3.037740
Epoch: 256  Training Accuracy: 0.274891     Validation Accuracy: 0.250299

Epoch: 257  Training Loss: 2.995479     Validation Loss: 2.974404
Epoch: 257  Training Accuracy: 0.275191     Validation Accuracy: 0.263473

Epoch: 258  Training Loss: 3.010075     Validation Loss: 2.966813
Epoch: 258  Training Accuracy: 0.266657     Validation Accuracy: 0.247904

Epoch: 259  Training Loss: 2.971134     Validation Loss: 2.957285
Epoch: 259  Training Accuracy: 0.275341     Validation Accuracy: 0.264671
Validation loss decreased (3.009181 --> 2.996492)  Saving model as model_scratch_val...

Epoch: 260  Training Loss: 2.990199     Validation Loss: 3.008039
Epoch: 260  Training Accuracy: 0.265309     Validation Accuracy: 0.265868
Accuracy has increased (0.264671 --> 0.265868)  Saving model as model_scratch_acc...

Epoch: 261  Training Loss: 2.975455     Validation Loss: 2.932802
Epoch: 261  Training Accuracy: 0.270400     Validation Accuracy: 0.263473
Validation loss decreased (2.996492 --> 2.987573)  Saving model as model_scratch_val...

Epoch: 262  Training Loss: 2.961632     Validation Loss: 2.984690
Epoch: 262  Training Accuracy: 0.277437     Validation Accuracy: 0.267066
Accuracy has increased (0.265868 --> 0.267066)  Saving model as model_scratch_acc...

Epoch: 263  Training Loss: 2.970838     Validation Loss: 2.983090
Epoch: 263  Training Accuracy: 0.274293     Validation Accuracy: 0.262275
Validation loss decreased (2.987573 --> 2.975762)  Saving model as model_scratch_val...

Epoch: 264  Training Loss: 2.961947     Validation Loss: 2.995247
Epoch: 264  Training Accuracy: 0.273394     Validation Accuracy: 0.261078

Epoch: 265  Training Loss: 2.962421     Validation Loss: 2.943192
Epoch: 265  Training Accuracy: 0.273993     Validation Accuracy: 0.268263
Accuracy has increased (0.267066 --> 0.268263)  Saving model as model_scratch_acc...

Epoch: 266  Training Loss: 2.955692     Validation Loss: 2.983354
Epoch: 266  Training Accuracy: 0.283276     Validation Accuracy: 0.256287
Validation loss decreased (2.975762 --> 2.971960)  Saving model as model_scratch_val...

Epoch: 267  Training Loss: 2.961222     Validation Loss: 2.941499
Epoch: 267  Training Accuracy: 0.277736     Validation Accuracy: 0.274251
Accuracy has increased (0.268263 --> 0.274251)  Saving model as model_scratch_acc...

Epoch: 268  Training Loss: 2.943230     Validation Loss: 3.016429
Epoch: 268  Training Accuracy: 0.280731     Validation Accuracy: 0.256287

Epoch: 269  Training Loss: 2.966477     Validation Loss: 3.001827
Epoch: 269  Training Accuracy: 0.280431     Validation Accuracy: 0.268263

Epoch: 270  Training Loss: 2.947989     Validation Loss: 2.944552
Epoch: 270  Training Accuracy: 0.275490     Validation Accuracy: 0.273054
Validation loss decreased (2.971960 --> 2.967297)  Saving model as model_scratch_val...

Epoch: 271  Training Loss: 2.941188     Validation Loss: 2.969306
Epoch: 271  Training Accuracy: 0.282677     Validation Accuracy: 0.262275

Epoch: 272  Training Loss: 2.948247     Validation Loss: 2.917534
Epoch: 272  Training Accuracy: 0.280731     Validation Accuracy: 0.263473
Validation loss decreased (2.967297 --> 2.954233)  Saving model as model_scratch_val...

Epoch: 273  Training Loss: 2.933625     Validation Loss: 2.938378
Epoch: 273  Training Accuracy: 0.282677     Validation Accuracy: 0.275449
Accuracy has increased (0.274251 --> 0.275449)  Saving model as model_scratch_acc...
Validation loss decreased (2.954233 --> 2.949634)  Saving model as model_scratch_val...

Epoch: 274  Training Loss: 2.937973     Validation Loss: 2.960940
Epoch: 274  Training Accuracy: 0.283725     Validation Accuracy: 0.271856

Epoch: 275  Training Loss: 2.931550     Validation Loss: 2.950349
Epoch: 275  Training Accuracy: 0.284923     Validation Accuracy: 0.281437
Accuracy has increased (0.275449 --> 0.281437)  Saving model as model_scratch_acc...

Epoch: 276  Training Loss: 2.923791     Validation Loss: 3.044055
Epoch: 276  Training Accuracy: 0.283725     Validation Accuracy: 0.265868

Epoch: 277  Training Loss: 2.913700     Validation Loss: 2.938901
Epoch: 277  Training Accuracy: 0.280731     Validation Accuracy: 0.282635
Accuracy has increased (0.281437 --> 0.282635)  Saving model as model_scratch_acc...
Validation loss decreased (2.949634 --> 2.942584)  Saving model as model_scratch_val...

Epoch: 278  Training Loss: 2.925501     Validation Loss: 2.898729
Epoch: 278  Training Accuracy: 0.279533     Validation Accuracy: 0.276647
Validation loss decreased (2.942584 --> 2.939137)  Saving model as model_scratch_val...

Epoch: 279  Training Loss: 2.893425     Validation Loss: 2.954740
Epoch: 279  Training Accuracy: 0.293457     Validation Accuracy: 0.282635
Validation loss decreased (2.939137 --> 2.931625)  Saving model as model_scratch_val...

Epoch: 280  Training Loss: 2.916728     Validation Loss: 2.930152
Epoch: 280  Training Accuracy: 0.285522     Validation Accuracy: 0.282635

Epoch: 281  Training Loss: 2.900486     Validation Loss: 2.896766
Epoch: 281  Training Accuracy: 0.281779     Validation Accuracy: 0.273054

Epoch: 282  Training Loss: 2.909466     Validation Loss: 2.901819
Epoch: 282  Training Accuracy: 0.290163     Validation Accuracy: 0.273054
Validation loss decreased (2.931625 --> 2.925920)  Saving model as model_scratch_val...

Epoch: 283  Training Loss: 2.879655     Validation Loss: 2.953244
Epoch: 283  Training Accuracy: 0.287318     Validation Accuracy: 0.275449

Epoch: 284  Training Loss: 2.910829     Validation Loss: 2.893557
Epoch: 284  Training Accuracy: 0.286121     Validation Accuracy: 0.292216
Accuracy has increased (0.282635 --> 0.292216)  Saving model as model_scratch_acc...
Validation loss decreased (2.925920 --> 2.923718)  Saving model as model_scratch_val...

Epoch: 285  Training Loss: 2.898269     Validation Loss: 2.907529
Epoch: 285  Training Accuracy: 0.295553     Validation Accuracy: 0.277844

Epoch: 286  Training Loss: 2.892020     Validation Loss: 2.905437
Epoch: 286  Training Accuracy: 0.297350     Validation Accuracy: 0.276647
Validation loss decreased (2.923718 --> 2.921438)  Saving model as model_scratch_val...

Epoch: 287  Training Loss: 2.888301     Validation Loss: 2.949574
Epoch: 287  Training Accuracy: 0.286869     Validation Accuracy: 0.267066

Epoch: 288  Training Loss: 2.871979     Validation Loss: 2.915198
Epoch: 288  Training Accuracy: 0.301093     Validation Accuracy: 0.276647
Validation loss decreased (2.921438 --> 2.909799)  Saving model as model_scratch_val...

Epoch: 289  Training Loss: 2.872260     Validation Loss: 2.917272
Epoch: 289  Training Accuracy: 0.290313     Validation Accuracy: 0.270659
Validation loss decreased (2.909799 --> 2.904593)  Saving model as model_scratch_val...

Epoch: 290  Training Loss: 2.884797     Validation Loss: 2.890345
Epoch: 290  Training Accuracy: 0.289564     Validation Accuracy: 0.286228
Validation loss decreased (2.904593 --> 2.898360)  Saving model as model_scratch_val...

Epoch: 291  Training Loss: 2.865433     Validation Loss: 2.882917
Epoch: 291  Training Accuracy: 0.298697     Validation Accuracy: 0.273054

Epoch: 292  Training Loss: 2.862329     Validation Loss: 2.926948
Epoch: 292  Training Accuracy: 0.296152     Validation Accuracy: 0.283832

Epoch: 293  Training Loss: 2.888048     Validation Loss: 2.889356
Epoch: 293  Training Accuracy: 0.291660     Validation Accuracy: 0.286228
Validation loss decreased (2.898360 --> 2.893616)  Saving model as model_scratch_val...

Epoch: 294  Training Loss: 2.849733     Validation Loss: 2.882064
Epoch: 294  Training Accuracy: 0.298099     Validation Accuracy: 0.281437

Epoch: 295  Training Loss: 2.876844     Validation Loss: 2.900681
Epoch: 295  Training Accuracy: 0.289864     Validation Accuracy: 0.271856

Epoch: 296  Training Loss: 2.860848     Validation Loss: 2.897152
Epoch: 296  Training Accuracy: 0.297200     Validation Accuracy: 0.279042

Epoch: 297  Training Loss: 2.860211     Validation Loss: 2.868611
Epoch: 297  Training Accuracy: 0.293158     Validation Accuracy: 0.282635
Validation loss decreased (2.893616 --> 2.883212)  Saving model as model_scratch_val...

Epoch: 298  Training Loss: 2.856004     Validation Loss: 2.838748
Epoch: 298  Training Accuracy: 0.294206     Validation Accuracy: 0.285030

Epoch: 299  Training Loss: 2.843515     Validation Loss: 2.868899
Epoch: 299  Training Accuracy: 0.294805     Validation Accuracy: 0.279042

Epoch: 300  Training Loss: 2.841376     Validation Loss: 2.799439
Epoch: 300  Training Accuracy: 0.300344     Validation Accuracy: 0.274251
Validation loss decreased (2.883212 --> 2.878668)  Saving model as model_scratch_val...


Training complete in 451m 40s
Best validation accuracy: 0.292216
Best accuracy epoch     : 284
Best validation loss    : 2.878668
Best validation epoch   : 300
In [ ]: